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ABSTRACT 


Characterization  And  Detection  Of  Vector-Borne  Diseases  In  Endemic  Transmission 
Areas 

Robin  H.  Miller,  Doctor  of  Philosophy,  2016 

Thesis  directed  by:  V.  Ann  Stewart,  DVM,  PhD,  Professor,  Department  of  Preventive 
Medicine  and  Biostatistics,  Division  of  Tropical  Public  Health 

Vector-borne  diseases  contribute  significantly  to  the  global  burden  of  infectious 
diseases  and  remain  a  major  public  health  challenge  worldwide.  Detection  and 
surveillance  of  the  pathogen  and  vector  are  critical  for  the  control  of  vector-borne 
diseases.  Japanese  encephalitis  virus  (JEV)  and  malaria  cause  a  significant  portion  of 
disease  and  mortality  due  to  vector-borne  diseases  globally.  The  research  described  in 
this  dissertation  aims  to  improve  detection  methods  for  both  the  vector  and  pathogens 
that  facilitate  JEV  and  malaria  transmission  in  an  effort  to  advance  vector-borne  disease 
control  and  potential  elimination.  First,  we  developed  an  ecological  niche  model  to 
estimate  the  distribution  of  the  JEV  vector,  Culex  tritaeniorhynchus ,  based  on 
environmental  variables  and  known  vector  locations  in  endemic  regions.  We  analyzed  the 
overlap  between  Japanese  encephalitis  (JE)  cases  and  predicted  prevalence  of  Cx. 
tritaeniorhynchus  distribution  as  well  as  the  prevalence  of  predicted  vector  habitat  within 
rice  fields.  Our  novel  ecological  niche  model  can  be  used  to  target  regions  for  vector 


control  strategies  and  vaccine  campaigns  in  order  to  reduce  JE  disease  burden  within  the 
human  population.  Second,  we  developed  a  novel  real-time  PCR  (qPCR)  assay  for  the 
detection  of  the  newly  described  Plasmodium  ovale  subspecies,  P.  ovale  curtisi  and  P. 
ovale  wallikeri.  Previous  assays  for  the  detection  of  P.  ovale  in  our  laboratory  and  others 
inadvertently  detected  only  one  of  the  two  P.  ovale  subspecies  due  to  genetic 
polymorphisms  in  the  primer  binding  regions,  resulting  in  an  overall  underestimation  of 
P.  ovale  prevalence  in  malaria  endemic  regions.  Our  newly  described  P.  ovale  assay  can 
successfully  detect  both  subspecies  with  high  sensitivity  and  specificity.  Additionally,  we 
report  the  first  evidence  that  both  P.  ovale  curtisi  and  P.  ovale  wallikeri  circulate  in  a 
malaria  holoendemic  region  in  Western  Kenya.  Our  P.  ovale- specific  qPCR  assay  can  be 
used  for  future  epidemiological  studies  to  improve  detection  of  P.  ovale  parasites  and 
increase  our  understanding  of  the  contribution  of  this  neglected  malaria  parasite  to  the 
global  malaria  disease  burden.  Finally,  we  utilized  an  amplicon-based  deep  sequencing 
approach  to  detect  multiclonal  P.  falciparum  infections  and  analyze  the  Complexity  of 
Infection  (COI)  based  on  several  demographic  factors  from  samples  collected  in  the 
Democratic  Republic  of  Congo  (DRC).  We  deep  sequenced  the  P.  falciparum  apical 
membrane  antigen  1  (pfamal )  gene  from  79  individual  samples  and  detected  68  unique 
pfamal  haplotypes.  We  found  that  the  majority  (64.5%)  of  individuals  had  multiclonal 
infections  (C0I>1)  and  also  found  no  association  between  P.  falciparum  COI  and  age, 
sex,  HIV  status,  or  geographical  location  within  the  DRC.  Overall,  we  report  high 
pfamal  genetic  diversity  from  asymptomatic  malaria  infections  in  the  DRC  and  highlight 
the  utility  of  amplicon-based  deep  sequencing  to  detect  low  frequency  strains  (variants) 
and  multiclonal  malaria  infections  that  have  been  shown  to  impact  malaria  clinical 


x 


disease  and  transmission  dynamics.  In  conclusion,  the  research  presented  herein  describes 
both  sensitive  molecular  and  geospatial  tools  that  can  be  utilized  to  improve  detection  of 
malaria  parasites  and  the  JEV  vector,  respectively. 
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CHAPTER  1:  Introduction 


Vector-Borne  Diseases  in  Humans 

Vector-borne  diseases  are  globally  distributed,  cause  severe  morbidity  and 
mortality,  disproportionally  affect  the  global  poor,  and  contribute  to  the  cycle  of  poverty 
(14;  275).  Vector-borne  diseases  cause  more  than  17%  of  all  infectious  diseases  in 
humans  and  result  in  over  1  million  deaths  each  year  worldwide  (17).  Additionally,  the 
World  Health  Organization  (WHO)  estimates  that  over  1  billion  people  in  over  100 
countries  are  infected  with  a  vector-borne  disease,  and  that  over  half  of  the  world’s 
population  is  at  risk  for  contracting  a  vector-borne  disease  (14).  A  summary  of  selected 
vector-borne  diseases  that  occur  in  humans,  shown  in  Table  1,  highlights  the  diversity  of 
vectors,  pathogens,  and  diseases  that  together  constitute  a  major  global  health  challenge. 

Vector-borne  diseases  are  significant  contributors  to  emerging  infectious  diseases 
(EIDs)  globally  and  present  several  challenges  that  impede  control  efforts  (112).  The 
ability  of  vector-borne  diseases  to  evade  control  strategies  is  due  to  several  factors.  First, 
vector-borne  disease  mitigation  depends  on  appropriate  and  sustained  vector  control 
strategies  that  take  into  account  the  particular  vector  habitat  and  life  cycle  (14;  112). 
These  vector  control  strategies  can  be  complicated  by  anthropomorphic  land-use  changes 
to  the  environment,  such  as  urbanization,  agriculture,  and  deforestation  that  impact  vector 
prevalence  and  can  facilitate  the  emergence  of  vector-borne  diseases  into  novel 
environments  (113;  243;  315).  Additionally,  climate  change  is  postulated  to  result  in 
changes  to  vector  habitat  and  distribution  that  may  contribute  to  the  expansion  of  vector- 
borne  diseases  into  naive  populations  (113;  243;  247;  277).  Second,  another  challenge  of 
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controlling  vector-borne  diseases  is  that  many  vector-bone  diseases  have  a  sylvatic  cycle 
and  infect  non-human  animal  reservoirs.  Vector-borne  diseases  have  a  range  of  natural 
reservoir  hosts,  including  birds,  bats,  pigs,  primates,  rodents  and  several  other  vertebrate 
species  (39).  Controlling  vector-borne  diseases  therefore  requires  identification  of  natural 
reservoir  hosts  and  subsequent  detection  of  the  pathogen  in  these  hosts  to  determine 
strategies  to  reduce  human  vector-borne  disease  prevalence.  Third,  vaccines  are  not 
commercially  available  for  most  vector-borne  diseases,  further  confounding  efforts  to 
control  these  pathogens  (14).  Vaccine  development  is  particularly  difficult  for  vector- 
borne  protozoan  pathogens,  such  as  malaria,  which  cause  a  significant  proportion  of 
vector-borne  disease  infections  and  deaths  worldwide  (4;  296).  Lastly,  detection  of 
vector-borne  disease-causing  pathogens  and  their  associated  vectors  requires  sensitive 
methods  for  pathogen  detection  and  field  resources  for  entomological  surveys  in  order  to 
determine  the  prevalence  of  vector-borne  disease  within  an  endemic  region  and  to 
identify  and  control  a  vector-borne  disease  outbreak  (160).  As  vector-borne  diseases 
continue  to  spread  into  new  geographical  locations  and  human  populations,  sensitive 
pathogen  detection  and  vector  surveillance  methods  are  important  tools  for  monitoring 
the  emergence  of  vector-borne  diseases  (160).  Detection  of  vector-borne  disease 
associated  pathogens  and  vectors  can  be  hindered  due  to  lack  of  resources  and  training  to 
perform  current  pathogen  detection  methods,  poor  sensitivity  and/or  specificity  of 
pathogen  detection  tools,  and  lack  of  resources  and  training  for  entomological  surveys  (7; 
14).  Based  on  these  factors  and  others,  control  of  vector-borne  diseases  remains  a 
difficult  and  important  global  challenge. 
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Epidemiological  Triad 


The  epidemiological  triad  (Figure  1)  is  an  infectious  disease  model  that  illustrates 
the  required  combination  of  host,  pathogen,  environment,  and  vector  to  cause  disease  (7; 
78).  Complexities  in  the  epidemiological  triad  model  arise,  as  different  vector-borne 
diseases  require  varying  factors  in  terms  of  host,  vector,  pathogen,  and  environment  for 
disease  to  occur  (78).  In  addition,  the  host,  vector,  pathogen  and  environmental 
conditions  that  contribute  to  disease  can  vary  based  on  geographic  location  (78).  Japanese 
encephalitis  and  malaria,  both  mosquito-borne  pathogens  with  distinct  epidemiological 
factors  that  cause  disease  in  humans,  are  discussed  in  greater  detail  below. 

The  Epidemiological  Triad:  Japanese  Encephalitis 

Disease 

The  majority  of  Japanese  encephalitis  virus  (JEV)  infections  are  asymptomatic, 
and  severe  Japanese  encephalitis  (JE)  cases  are  estimated  to  occur  in  1  out  of  every  250 
JEV  infections  (21).  The  WHO  estimates  approximately  68,000  clinical  JE  cases  occur 
every  year  and  JE  remains  a  major  cause  of  viral  encephalitis,  particularly  in  children,  in 
endemic  Asian  countries  (15).  The  case  fatality  rate  for  clinical  JE  is  estimated  to  be 
about  30%,  and  neurological  sequelae  occur  in  30-50%  of  recovered  individuals  (15). 
Clinical  JE  symptoms  include  fever,  chills,  headache,  myalgia,  and  vomiting,  which  can 
progress  to  more  severe  symptoms  such  as  changes  in  mental  status,  confusion,  seizures, 
and  coma  (20;  21).  Despite  the  lack  of  specific  treatment  for  JE,  vaccines  are  available 
and  have  been  shown  to  reduce  cases  and  deaths  due  to  JEV  in  endemic  regions  (164; 
255). 
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Host 


JEV  Life  Cycle 

Humans  are  dead-end  hosts  for  JEV  as  levels  of  viremia  in  humans  are  too  low 
for  transmission  to  biting  mosquitoes  (reviewed  in  (315)).  JEV  is  maintained  in  an 
enzootic  cycle  in  which  wading  birds  and  pigs  serve  as  amplifying  hosts  and  reservoirs 
for  the  mosquito  vector  (158). 

Pathogen 

Japanese  Encephalitis  Virus 

Japanese  encephalitis  virus  (JEV)  is  a  member  of  the  Flaviviridae  family  that 
includes  other  arboviruses  such  as  West  Nile  Virus  (WNV)  and  Dengue  virus  (DENV). 
JEV  is  a  zoonotic  virus  and  is  comprised  of  five  genotypes  (I-V),  often  referred  to  as 
strains,  which  are  restricted  to  different  geographical  locations  throughout  the  JE  endemic 
region  (108). 

JEV  Detection 

As  JE  symptoms  are  indistinguishable  from  other  causes  of  acute  encephalitis,  the 
WHO  recommends  laboratory  diagnosis  for  individuals  that  meet  the  clinical  case 
definition  of  Acute  Encephalitis  Syndrome  (AES)  (123).  The  standard  diagnostic  test  for 
JEV  infection  is  an  enzyme-linked  immunosorbent  assay  (ELISA)  for  the  detection  of 
JEV-specific  IgM  in  the  cerebrospinal  fluid  (CSF),  preferably,  or  in  serum  if  CSF  is  not 
available  (123). 
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Vector 


Mosquitoes  in  the  genus  Culex,  predominately  Cx.  tritaeniorhynchus ,  transmit 
JEV  (56).  Cx.  tritaeniorhynchus  is  found  throughout  Southeast  Asia  and  in  several 
countries  in  sub-Saharan  Africa  and  the  Middle  East  (1).  Breeding  habitats  include 
shallow  pools,  rice  paddies,  swamps,  streams,  marshes,  puddles,  and  other  temporary 
water  sources  (1).  Cx.  tritaeniorhynchus  females  are  highly  zoophilic  and  prefer  to  feed 
on  pigs,  cattle,  birds,  and  other  animals  instead  of  humans  (229). 

Geographic  information  systems  (GIS)  are  powerful  tools  that  allow  for 
visualization  and  spatial  modeling  of  disease  vectors.  GIS  can  utilize  large  data  sets 
consisting  of  vector  collection  coordinates,  remote- sensing  images  from  satellite  imagery, 
and  spatial  modeling  programs  to  display  and  even  predict  vector  distribution  with  high 
resolution  (reviewed  in  (219)  and  (135)).  GIS  maps  can  be  used  to  target  regions  for 
vector  control  based  on  vector  prevalence  and  abundance.  Accordingly,  the  WHO 
recommends  using  GIS  in  vector  control  programs,  particularly  in  resource-limited 
settings  where  entomological  surveys  are  not  practical  (171).  GIS  are  also  useful  ways  to 
produce  visual  ecological  niche  models  that  predict  the  distribution  of  vector  species 
based  on  previously  collected  geospatial  (occurrence)  points  and  environmental  habitat 
preference  (313).  Several  studies  have  shown  the  utility  of  GIS  and  ecological  niche 
modeling  of  vector  distribution  based  on  climatic  factors  and  land  use  data  to  assess 
vector-borne  pathogen  risk  for  many  vector-borne  diseases  in  a  range  of  locations 
worldwide  (32;  70;  115;  156;  190;  205;  226).  Overall,  GIS  are  robust  mapping  tools  that 
allow  for  the  visualization,  analysis,  and  statistical  modeling  of  geospatial  data  as  it 
relates  to  vector-borne  disease  prevalence  and  transmission. 
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Environment 


Proximity  to  rice  fields  has  been  shown  to  increase  the  risk  of  JEV  transmission  to 
humans,  as  rice  fields  are  the  preferred  habitat  for  Cx.  tritaeniorhynchus  breeding  (141). 
The  presence  of  amplifying  hosts,  such  as  pigs  and  large  wading  birds  has  also  been 
shown  to  increase  the  risk  of  JEV  transmission  due  to  high-level  JEV  viremias  in  these 
reservoir  species  (158). 

JEV  is  an  emerging  virus  throughout  much  of  Southeast  Asia  and  the  Western 
Pacific  (173).  JEV  is  thought  to  have  originated  in  Indonesia  and  Malaysia  and  later 
emerged  in  Japan,  China,  and  Korea  in  the  late  1800s  (92;  315).  JEV  spread  further  into 
Thailand  and  Vietnam  and  more  recently  into  Nepal,  India,  and  Pakistan  (92;  315). 
Additional  studies  indicate  JEV  is  now  expanding  east  into  New  Guinea  and  south  into 
northern  Australia  (118;  173).  JEV  has  emerged  dramatically  across  Southeast  Asia  and 
the  Western  Pacific,  often  times  following  the  expansion  of  rice  cultivation  and  pig 
farming,  and  is  currently  a  major  risk  for  viral  encephalitis  throughout  the  entire  endemic 
region  (311). 

The  Epidemiological  Triad:  Malaria 

Disease 

Uncomplicated  clinical  malaria  is  classically  described  as  an  initial  prodromal 
period  followed  by  alternating  cycles  of  fever  and  chill  paroxysms.  Uncomplicated 
malaria  can  be  caused  by  any  of  the  malaria  parasite  species  and  consists  of  several  non¬ 
specific  symptoms  such  as  shivering,  nausea,  vomiting,  headache,  diarrhea,  and  body 
aches  (37).  Severe  malaria  refers  to  clinical  malaria  with  evidence  of  vital  organ 
dysfunction  such  as  convulsions,  respiratory  distress,  shock,  kidney  failure,  jaundice, 
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prostration,  severe  anemia,  and  coma  (11).  Although  the  majority  of  severe  malaria  cases 
are  due  to  Plasmodium  falciparum,  P.  vivax  and  P.  knowlesi  also  can  cause  severe 
malaria  and  death  (11). 

Vector 

Mosquitoes  of  the  genus  Anopheles  transmit  human  malaria.  Although  the  genus 
Anopheles  is  comprised  of  over  430  species,  only  70  species  can  transmit  malaria,  of 
which  approximately  40  species  are  considered  major  vectors  for  human  malaria 
transmission  globally  (257).  Dominant  malaria  vector  species  vary  by  geographical 
location.  For  instance,  An.  arabiensis  dominates  in  parts  of  eastern  Africa,  while  An. 
darlingi  is  a  major  malaria  vector  in  South  America  (144).  In  addition  to  geographical 
variation,  anopheline  species  have  varying  feeding  behaviors,  such  as  preference  for 
feeding  inside  (endophagic)  or  outside  (exophagic),  and  preference  for  human 
(anthropophagic)  or  animal  (zoophagic)  hosts  (144).  Anopheline  species  also  exhibit 
widely  varied  preferences  for  larval  habitats,  on  the  basis  of  factors  such  as  salinity 
levels,  sunlight,  vegetation  cover,  and  natural  versus  artificial  water  sources  (144).  Based 
on  the  geographical  and  behavioral  differences  of  anopheline  mosquitoes,  it  is  important 
to  identify  the  dominant  and  secondary  malaria  vectors  contributing  to  malaria 
transmission  in  an  endemic  region  for  control  measures  to  be  effective. 

Environment 

The  environmental  factors  that  contribute  to  malaria  disease  transmission  depend 

largely  on  the  specific  characteristics  of  the  malaria  endemic  region  and  the  behaviors 

and  ecological  requirements  of  the  anopheline  vector  species  in  that  region.  Several 

climatic  factors,  including  rainfall,  temperature,  and  humidity  are  associated  with  malaria 
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transmission  as  they  influence  vector  breeding  preferences  (257).  Agricultural  practices 
have  also  been  shown  to  influence  anopheline  vector  prevalence  in  some  malaria  endemic 
regions.  For  example,  the  presence  of  rice  fields  has  been  shown  to  increase  malaria 
transmission  intensity  in  Southeast  Asia  (257),  but  had  no  effect  on  malaria  transmission 
in  Cote  d’Ivoire  (77).  In  addition  to  ecological  and  climatic  factors,  regions  of  social  or 
political  disturbance  are  also  associated  with  increased  malaria  transmission  (221).  Thus, 
there  is  a  complex  relationship  between  climate,  ecology,  anthropomorphic  factors,  and 
anopheline  vector  behavior  that  together  can  influence  malaria  transmission. 

Host 

Malaria  endemicity  is  classified  into  four  levels:  holoendemic,  hyperendemic, 
mesoendemic,  and  hypoendemic  (Table  2).  Regions  with  holoendemic  malaria 
transmission  typically  experience  intense,  stable  malaria  transmission  year  round,  and 
thus  individuals  living  in  these  regions  are  routinely  infected  with  malaria  parasites  (121). 
Clinical  malaria  in  holoendemic  regions  is  typically  restricted  to  children  less  than  five 
years  old  and  pregnant  women  (269;  272).  Older  children  and  adults  develop  naturally 
acquired  immunity  (NAI)  that  protects  against  clinical  malaria,  but  as  NAI  to  malaria  is 
non-sterile,  these  clinically  protected  individuals  still  maintain  asymptomatic  malaria 
infections  (82;  154).  The  majority  of  malaria  infections  in  holoendemic  regions  are 
therefore  asymptomatic,  and  these  asymptomatically  infected  individuals  are  important 
reservoirs  for  malaria  transmission  (29;  51;  153).  This  is  in  contrast  to  regions  with  less 
malaria  transmission  (hyperendemic,  mesoendemic,  and  hypoendemic,  see  table  2),  in 
which  individuals  of  all  ages  are  at  risk  for  developing  clinical  malaria  and  asymptomatic 
infections  are  less  common  (208). 
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Pathogen 

Malaria  Life  Cycle 

The  complex  malaria  life  cycle  includes  ten  parasite  morphological  changes  that 
occur  in  several  different  types  of  host  tissue  in  both  in  the  anopheline  mosquito 
(definitive  host)  and  the  human  (intermediate  host)  (174).  When  an  infectious  female 
anopheline  mosquito  takes  a  blood  meal  from  a  susceptible  human  host,  motile  malaria 
sporozoites  are  injected  into  the  skin.  During  the  exo-erythrocytic  stage  of  the  life  cycle, 
sporozoites  travel  to  the  liver,  invade  hepatocytes,  and  develop  first  into  liver 
trophozoites  and  then  into  liver  schizonts.  Although  malaria  parasite  development  during 
the  exo-erythrocyctic  stage  typically  takes  7-14  days,  both  P.  ovale  and  P.  vivax  can 
develop  into  hypnozoites  that  remain  dormant  in  the  liver  for  months  to  years  after  the 
initial  infection  and  cause  relapsing  malaria.  In  all  human  malaria  species,  liver  schizonts 
eventually  rupture  the  hepatocyte  and  release  merozoites  that  enter  the  blood  stage, 
initiating  the  erythrocytic  stage  of  the  life  cycle.  Merozoites  quickly  invade  the  host  red 
blood  cells,  where  they  develop  into  trophozoites  and  then  into  schizonts.  The  malaria 
schizont  then  ruptures  the  red  blood  cell  and  merozoites  are  released,  which  invade  new 
red  blood  cells  and  initiate  a  subsequent  round  of  the  erythorcytic  cycle.  Depending  on 
the  malaria  parasite  species,  the  erythrocytic  cycle  takes  approximately  24-72  hours 
(Table  3).  A  subset  of  malaria  parasites  during  the  erythrocytic  cycle  will  develop  into 
both  male  and  female  gametocytes,  the  malaria  parasite  stage  infectious  for  the  female 
anopheline  mosquito  (261).  The  malaria  life  cycle  stages  that  occur  in  the  mosquito  are 
referred  to  as  the  sporogonic  cycle,  which  begins  when  an  anopheline  mosquito  takes  a 
blood  meal  containing  malaria  gametocytes.  Once  in  the  anopheline  mosquito  midgut,  the 
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male  and  female  malaria  gametocytes  fuse  to  form  the  zygote.  The  zygote  undergoes 
meiosis  and  genetic  recombination  while  in  the  mosquito  midgut.  The  zygote  then 
develops  into  the  motile  ookinete,  which  invades  the  mosquito  midgut  wall.  The  ookinete 
transforms  into  an  oocyst,  which  continues  to  grow  and  divide  in  the  midgut  wall  until  it 
ruptures  and  releases  sporozoites.  Sporozoites  invade  the  mosquito  salivary  glands, 
allowing  for  transmission  to  the  human  host  during  the  next  blood  meal  (261). 

Malaria  Parasite  Complexity 

Five  different  malaria  species  infect  humans:  P.  falciparum,  P.  vivax,  P.  malariae, 
P.  knowlesi,  and  the  newly  described  P.  ovale  subspecies:  P.  ovale  curtisi  and  P.  ovale 
wallikeri  (Table  3)  (82;  213).  In  malaria  endemic  regions,  individuals  can  be  infected 
with  multiple  malaria  parasite  species  at  the  same  time  (53;  54;  322).  These  multi-species 
infections  have  been  shown  to  influence  both  clinical  outcomes  and  responses  to 
antimalarial  treatment  (177;  264).  For  example,  case  studies  in  non-immune  travelers 
returning  from  malaria  endemic  regions  have  documented  patients  initially  diagnosed  and 
treated  for  P.  falciparum  experiencing  relapsing  malaria  due  to  a  non- P.  falciparum 
malaria  (64;  186;  256).  In  contrast  to  treatment  for  P.  falciparum,  the  relapsing  malaria 
parasite  species  P.  vivax,  P.  ovale  curtisi,  and  P.  ovale  wallikeri  must  be  treated  with 
primaquine  to  ensure  dormant  hypnozoites  are  cleared  from  the  liver.  Proper  detection 
and  diagnosis  of  these  complex  malaria  infections  is  therefore  critical  to  ensure  proper 
antimalarial  treatment  is  provided. 

In  addition  to  infection  of  multiple  malaria  species,  humans  can  also  be  infected 
with  multiple  strains  of  a  single  malaria  species  (31;  178;  265).  The  number  of  distinct 
malaria  strains  infecting  a  single  individual  is  referred  to  as  the  Complexity  of  Infection 
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(COI)  (44).  Multiclonal  (C0I>1)  malaria  infections  have  been  shown  to  influence  clinical 
disease,  impact  malaria  transmission  dynamics,  and  may  be  a  useful  marker  for  malaria 
transmission  intensity  in  endemic  areas  (26;  52;  67;  72;  97).  Detection  of  multiclonal 
infections  requires  sensitive  molecular  techniques  that  can  detect  multiple  genetically 
related  strains  that  occur  at  varying  frequencies,  including  low-level  minor  variants, 
within  an  individual  (31;  104;  134;  266). 

Malaria  Detection  Methods 

A  summary  of  malaria  detection  methods  is  shown  in  Table  4.  Microscopic 
diagnosis  of  malaria  remains  the  “gold  standard”  for  malaria  detection  (320).  Despite  the 
widespread  use  and  utility  of  microscopic  diagnosis  of  malaria  parasites,  microscopy 
poses  several  challenges  for  malaria  detection  and  diagnosis.  For  example,  proper 
microscopic  diagnosis  requires  highly  trained  and/or  expert  microscopists  who  must 
undergo  periodic  evaluation  to  ensure  quality  control  (QC)  and  quality  assurance  (QA)  of 
the  malaria  diagnostic  program  (214;  320).  Several  studies  have  shown  that  misdiagnosis 
of  malaria  species  can  occur  using  light  microscopic  detection,  especially  in  the  context 
of  mixed  species  infections  (211).  Highly  skilled  microscopists  can  detect  low-level 
malaria  parasites,  although  the  accuracy  of  microscopic  detection  depends  on  the 
expertise  of  the  microscopist  and  training  methods  (209;  211;  214). 

Malaria  Rapid  Diagnostic  Tests  (mRDTs)  are  increasingly  used  as  a  method  for 
malaria  detection  and  diagnosis  (Reviewed  in  (43)).  mRDTs  detect  malaria  antigen  from 
a  small  volume  of  patient  blood  based  on  a  lateral  flow  immune-choromatographic  test 
strip  (16).  Over  200  different  mRDTs  are  now  commercially  available  (16).  Yet,  the 
majority  of  mRDTs  detect  P.  falciparum  only,  while  a  select  few  can  detect  P. 
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falciparum  plus  additional  non-falciparum  species  (16).  The  use  of  mRDTs  for  malaria 
detection  and  diagnosis  has  several  advantages  over  microscopy.  Training  and  skill  level 
required  for  mRDT  administration  is  minimal  and  does  not  require  laboratory  space  or 
electricity,  highlighting  the  utility  of  mRDTs  in  resource-limited  areas  (43).  Despite  the 
uptake  and  usage  of  mRDTs  in  several  countries  as  part  of  their  malaria  control 
programs,  there  also  remain  pitfalls  of  mRDT  usage  for  malaria  detection  and  diagnosis 
(43;  176).  These  include  the  inability  of  some  mRDTs  to  detect  low-level  malaria 
infections  and  the  failure  to  detect  non-falciparum  malaria  species,  resulting  in  false 
negative  results  that  can  delay  malaria  drug  treatment  (43;  111). 

The  application  of  Polymerase  Chain  Reaction  (PCR)  for  the  detection  of  malaria 
parasites  has  increased  the  ability  of  researchers  and  clinicians  to  detect  low-level  and 
submicroscopic  malaria  parasitemias  (248;  319).  PCR  detection  of  low-level  malaria 
parasitemias  is  a  critical  tool  for  both  understanding  malaria  epidemiology  and 
facilitating  malaria  control.  As  more  malaria  endemic  regions  enter  control  and 
elimination  phases,  PCR  detection  of  low-level  malaria  parasitemias  is  a  sensitive 
surveillance  tool  to  monitor  malaria  transmission  (1 19;  128).  Furthermore,  PCR  is  often 
utilized  for  early  detection  of  low-level  drug  resistant  malaria  parasites  that  persist  after 
antimalarial  treatment,  which  is  critical  for  monitoring  the  spread  and  emergence  of 
malaria  drug  resistance  (8).  In  addition  to  increased  sensitivity  for  the  detection  of 
malaria  parasites,  PCR  assays  that  target  species-specific  nucleic  acid  sequences  can 
detect  and  differentiate  mixed  malaria  species  infections  that  may  be  missed  or 
undetected  based  on  light  microscopy  and  mRDT  (266).  The  development  of  real-time 
PCR  (qPCR)  assays  have  further  expanded  the  utility  of  these  molecular  based  assays  for 
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malaria  detection  by  increasing  specificity  through  the  use  of  probe  based  technology 
(248). 

Next-generation  deep  sequencing  technologies  are  also  highly  sensitive  methods 
for  malaria  detection  (192;  310)  and  have  been  shown  to  successfully  detect  low 
frequency  parasites  (134;  310).  Deep  sequencing,  also  referred  to  as  massively  parallel 
sequencing,  refers  to  nucleic  acid  sequencing  methods  that  generate  multiple  sequence 
reads,  resulting  in  high  coverage  of  target  sequences.  Several  next- generation  sequencing 
platforms  are  currently  available,  each  with  distinct  template/library  preparation 
protocols,  sequencing  chemistries,  detection  methods,  run  times,  output  read  lengths, 
number  of  reads  generated  per  run,  and  cost  (55;  59).  However,  the  overall  theory  behind 
next-generation  sequencing  methods  is  similar.  First,  the  target  nucleic  acid  is  sheared 
and  further  processed  for  library  preparation  (59).  The  library  amplification  step  typically 
includes  adapter  ligation,  immobilization  of  the  fragment  (usually  on  a  bead  or  array), 
and  PCR  amplification.  Next,  the  amplified  libraries  are  used  as  template  for  deep 
sequencing,  during  which  the  NGS  platform  detects  the  addition  of  a  nucleotide,  usually 
by  fluorescence,  pyrophosphate  detection,  or  proton  release  (55;  59).  Finally,  the  last  step 
consists  of  appropriately  trimming,  clustering,  aligning,  and  analyzing  the  resulting 
sequence  reads  using  the  manufacturer  or  institutionally  provided  software  or  through  a 
custom  bioinformatics  analysis  pipeline  (59).  A  summary  of  selected  next-generation 
sequencing  technologies  is  provided  in  Table  5. 

There  are  several  advantages  of  using  next-generation  deep  sequencing  for 
malaria  genomics  studies  (129;  310),  including  detection  of  drug  resistant  loci, 
characterization  of  low  frequency  malaria  parasitemias  in  multiclonal  malaria  infections, 
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malaria  whole  genome  sequencing,  population  genetic  studies,  and  malaria  transmission 
surveillance  (71;  72;  104;  129;  134;  169;  192;  310).  The  ongoing  significant  decreases  in 
cost  associated  with  next-generation  sequencing  further  highlights  their  potential  to 
enhance  malaria  detection  and  aid  malaria  control  efforts  (129). 

Molecular  based  detection  methods,  such  as  PCR,  qPCR,  and  next-generation 
sequencing  technologies  are  highly  sensitive  for  the  detection  of  low-level  parasitemias, 
mixed  malaria  species  infections,  and  multiclonal  malaria  parasite  infections  compared  to 
traditional  methods,  such  as  microscopy  and  mRDTs.  As  detection  of  the  malaria  parasite 
is  critical  for  malaria  control,  utilizing  sensitive  molecular  tools  will  likely  improve 
current  malaria  detection  and  surveillance  methods.  Further,  GIS  can  be  utilized  to  map 
the  geographical  locations  of  malaria  parasites  based  on  the  results  of  malaria  detection 
assays  or  using  mathematical  models  that  estimate  the  distribution  of  malaria  parasite 
species  and  strains  (47;  126;  180;  258;  291). 

Applying  The  Epidemiological  Triad  Model  for  JEV  and  Malaria  Control 

Control  of  vector-borne  pathogens  requires  an  understanding  of  the  host, 
pathogen,  environment,  and  vector  that  cause  disease.  In  terms  of  public  health 
nomenclature,  pathogen  control  refers  specifically  to  the  reduction  of  disease  morbidity 
and  mortality,  prevalence,  and  incidence  within  a  geographical  region  (83).  Pathogen 
elimination  requires  reducing  disease  incidence  to  zero  over  a  sustained  period  of  time 
within  a  geographical  region  (83).  Both  pathogen  control  and  elimination  require 
sustained  intervention  strategies  to  prevent  pathogen  reintroduction  and  disease 
reemergence  (83).  Pathogen  eradication  refers  to  the  global  reduction  of  disease 
incidence,  thus  continued  intervention  strategies  are  not  required  (83). 
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Vector-borne  diseases  have  successfully  been  targeted  for  control  and  elimination 
in  several  countries  around  the  world,  and  typically  require  a  multifaceted  approach  that 
includes  detection  and  surveillance  of  the  pathogen  in  vectors,  hosts  (human  and 
reservoir)  and  the  environment.  For  example,  Mexico  recently  validated  the  elimination 
of  river  blindness  (onchocerciasis)  using  a  PCR  assay  designed  to  detect  Onchocerca 
volvulus  in  the  black  fly  vector  (244).  To  successfully  eliminate  lymphatic  filariasis, 
several  countries,  including  China  and  the  Republic  of  Korea,  utilized  geospatial 
surveillance  to  identify  regions  for  targeted  mass  drug  administration  (MDA)  and 
monitor  for  reemergence  of  the  pathogen  in  vector  and  human  populations  after  MDA 
(130).  Additionally,  the  WHO  recommends  that  countries  in  the  malaria  elimination  stage 
conduct  disease  surveillance,  which  includes  detection  of  the  malaria  parasite  in  the 
human  host  and  also  monitoring  the  mosquito  vector  habitat  and  breeding  sites  (10). 

Malaria  and  JEV  are  two  important  pathogens  that  together  cause  a  substantial 
proportion  of  morbidity  and  mortality  attributed  to  vector-borne  diseases  worldwide.  The 
epidemiological  triad  model  for  disease  control  demonstrates  that  detection  of  both  the 
pathogen  and  vector  are  critical  for  mitigating  disease  caused  by  JEV  and  malaria. 
Improvement  in  pathogen  detection,  through  the  use  of  highly  sensitive  molecular  assays, 
and  vector  surveillance,  through  geospatial  modeling,  will  help  to  better  understand  how 
these  factors  contribute  to  disease  and  provide  a  framework  for  the  control  and  possible 
elimination  of  JEV  and  malaria. 
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Research  Goal,  Objectives,  and  Rationale 
Research  Goal 

The  overall  goal  of  the  research  in  this  dissertation  is  to  improve  the  detection  of 
pathogens  and  vectors  that  cause  vector-borne  diseases  in  humans. 

Research  Objectives 

The  objectives  of  this  dissertation  research  are  to  1)  utilize  geospatial  tools  to 
estimate  JEV  vector  prevalence  and  2)  develop  highly  sensitive  molecular  tools  to  detect 
malaria  parasite  species  and  strains  in  complex  infections. 

Rationale 

Importance  of  Vector  Detection  for  Disease  Control 

Understanding  the  prevalence  and  distribution  of  the  vector  is  critically  important 
for  controlling  vector-borne  pathogen  disease  and  transmission.  Vector  surveillance  can 
be  used  to  detect  the  emergence  of  a  new  vector  into  an  environment  and  also  for 
monitoring  the  effectiveness  of  vector  control  strategies,  such  as  insecticides  (24;  116; 
142;  155;  237).  Vector  detection  is  conducted  using  a  variety  of  methods,  including  field 
surveys  using  adult  stage  traps,  immature  stage  (larval  and  pupal)  surveys,  and  host 
landing  and  biting  counts  (22;  196;  257).  Additionally,  GIS  can  map  vector  geospatial 
sampling  data  and  estimate  disease  risk  within  an  endemic  region  based  on  ecological 
niche  models  (219;  313). 

Importance  of  Pathogen  Detection  for  Disease  Control 

As  shown  by  the  epidemiological  triad,  the  pathogen  is  also  a  critical  factor  in 
disease  causation.  Detection  of  the  pathogen  within  the  host,  vector,  and/or  environment 
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is  key  for  understanding  the  prevalence  and  distribution  of  a  vector-borne  pathogen. 
Detection  and  surveillance  of  vector-borne  pathogens  provides  important  epidemiological 
information  that  can  then  be  utilized  for  public  health  efforts  to  mitigate  disease  and 
prevent  transmission  (6;  8;  10).  Vector-borne  pathogen  surveillance  requires  detection 
tools  that  can  identify  the  pathogen,  even  at  low  levels  (14).  Use  of  highly  sensitive 
nucleic  acid  based  detection  methods,  such  as  qPCR  and  deep  sequencing,  have  shown 
increased  capabilities  in  detecting  low-level  pathogen  infections  and  therefore  provide 
valuable  information  about  pathogen  prevalence  that  can  facilitate  control  of  vector-borne 
diseases  (134;  136;  169;  178;  267). 

AimI 

Develop  an  ecological  niche  model  to  estimate  the  JEV  vector  Cx. 
tritaeniorhynchus  prevalence  in  Japanese  encephalitis  endemic  regions. 

Culex  tritaeniorhynchus  geospatial  collection  data  and  environmental  variables 
will  be  utilized  to  develop  an  ecological  niche  model  for  Japanese  encephalitis  virus 
(JEV)  vector  prevalence  in  the  Japanese  encephalitis  (JE)  endemic  region.  The  ecological 
niche  model  will  be  compared  to  locations  of  documented  JE  clinical  cases  and  rice  fields 
using  GIS.  Analysis  of  environmental  and  climatic  factors  that  contributed  significantly 
to  the  ecological  niche  model  will  be  performed  in  order  to  investigate  abiotic  factors  that 
influence  JEV  vector  distribution.  The  ecological  niche  model  for  Cx.  tritaeniorhynchus 
will  identify  regions  with  an  increased  risk  of  the  dominant  JEV  vector  and  will  also 
provide  a  framework  for  targeting  vector  control  programs  and  vaccine  administration  to 
control  JE  disease  in  human  populations. 
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Aim  2 


Improve  detection  of  P.  ovale  in  malaria  endemic  regions  by  targeting  a 
conserved  genetic  region  between  P.  ovale  curtisi  and  P.  ovale  wallikeri  subspecies 
using  a  real-time  PCR  (qPCR)  platform. 

P.  ovale  is  a  neglected  malaria  parasite  species  that  was  recently  discovered  to 
exist  as  two  genetically  distinct  subspecies:  P.  ovale  curtisi  and  P.  ovale  wallikeri. 
Previous  molecular  based  methods  for  P.  ovale  detection  inadvertently  targeted  only  one 
of  the  two  subspecies,  thereby  failing  to  detect  all  P.  ovale  infections.  An  inclusive  real¬ 
time  PCR  (qPCR)  assay  will  be  developed  based  on  a  conserved  genetic  region  between 
P.  ovale  curtisi  and  P.  ovale  wallikeri  in  order  to  detect  both  subspecies.  Several  qPCR 
validation  steps  will  be  performed  in  order  to  evaluate  both  sensitivity  and  specificity  of 
the  assay.  Additionally,  the  presence  of  both  P.  ovale  curtisi  and  P.  ovale  wallikeri  in  a 
malaria  holoendemic  region  in  Kenya  will  be  investigated  using  a  multilocus  genotyping 
approach.  The  P.  ovale- specific  qPCR  assay  will  allow  for  detection  of  both  P.  ovale 
subspecies,  thereby  improving  detection  of  a  neglected  but  clinically  relevant  malaria 
parasite  species. 

Aim  3 

Utilize  a  deep  sequencing  approach  to  detect  multiclonal  P.  falciparum 
infections. 

Multiclonal  P.  falciparum  infections  can  impact  malaria  clinical  outcomes, 
influence  within-host  transmission  dynamics,  and  indicate  malaria  transmission  intensity. 
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Detection  of  multiclonal  P.  falciparum  remains  a  challenge  due  to  lack  of  sensitive  tools 
to  detect  low  frequency  strains  (variants)  within  a  multiclonal  infection.  An  amplicon- 
based  deep  sequencing  approach  targeting  the  P.  falciparum  apical  membrane  antigen  1 
ipfamal)  gene  will  be  utilized  to  explore  multiclonal  P.  falciparum  infections  in  the 
Democratic  Republic  of  Congo  (DRC).  Amplicon-based  deep  sequencing  allows  for 
sensitive  detection  of  minor  variant  strains  within  a  multiclonal  P.  falciparum  infection 
due  to  the  extensive  coverage  of  the  target  pfamal  sequence.  Several  epidemiological 
factors  associated  with  multiclonal  infections  (C0I>1)  will  be  analyzed,  including  age, 
sex,  HIV  status,  and  geographical  location  in  the  DRC.  Detection  of  complex  P. 
falciparum  infections  can  be  utilized  to  better  understand  the  relationship  of  multiclonal 
P.  falciparum  infections  and  clinical  disease  and  also  potentially  as  a  marker  for 
transmission  dynamics  in  malaria  control  settings. 
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Table  1.  Summary  of  Selected  Vector-Borne  Diseases  in  Humans. 

Table  adapted  from  the  WHO:  A  Global  Brief  on  Vector-Borne  Diseases  (14). 


Vector 

Pathogen 

Disease 

Mosquitoes 

Plasmodium  spp.,  dengue 
virus,  yellow  fever  virus, 
chikungunya  virus.  West 
Nile  Virus,  Wuchereria 
bancrofti,  Brugia  malayi, 

B.  timori,  Japanese 
encephalitis  virus 

Malaria,  dengue,  yellow 
fever,  chikungunya,  West 
Nile  virus,  lymphatic 
filariasis,  Japanese 
encephalitis 

Sandflies 

Leishmania  spp. 

Leishmaniasis 

Triatomine  bugs 

Trypanosoma  cruzi 

Chagas  Disease 

Ticks 

Borrelia  burgdorferi, 
Crimean-Congo 
hemorrhagic  fever  virus, 
Rickettsia  rickettsii 

Lyme  disease,  Crimean- 
Congo  hemorrhagic  fever, 
Rocky  Mountain  spotted 
fever 

Fleas 

Yersinia  pestis,  Rickettsia 
typhi 

Plague,  murine  typhus 

Flies 

Trypanosoma  brucei 
gambiense,  T.  b 
rhodesiense,  Onchocerca 
volvulus 

Human  African 
trypanosomiasis, 
onchocerciasis  (river 
blindness) 

Freshwater  snails 

Schistosoma  mansoni,  S. 
japonicum,  S. 
haematobium 

Schistosomiasis/bilharzia 
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Table  2.  Malaria  endemicity  levels. 
Modified  from  (268). 


Holoendemic 

Hyperendemic 

Mesoendemic 

Hypoendemic 

Parasite  >75%» 

prevalence  (2- 

>50% 

11-50% 

0-10% 

9  year  olds) 

*  In  infants  (0-11  months) 

Table  3.  Defining  characteristics  of  the  five  human  malaria  parasites  (82). 


P.falciparun 

P.  vivax 

P.  malariae 

P.  ovale 

P.  knowlesi 

Geographic 

Range 

Pan-tropical 

Pan- 

tropical, 

temperate 

Pan-tropical 

Africa, 

Southeast 

Asia 

Southeast 

Asia 

Prevalence 

High 

High 

Low 

Low 

Low 

Pre- 

erythrocytic 
stage  (day) 

5-7 

6-8 

14-16 

9 

8-9 

Erythrocyctic 
cycle  (hr) 

48 

48 

72 

~48 

24 

Severe 

+++ 

+++ 

+ 

+ 

j _ | _ |_ 

malaria 

i  1  r 

Drug 

Resistance 

Yes 

Yes 

No 

No 

No 

Relapse 

No 

Yes 

No 

Yes 

No 
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Table  4.  Malaria  detection  methods. 
Table  adapted  from  (285). 


Detection  Limit 
(parasites/jul) 

Expertise 

Time  (min) 

Cost 

Light 

microscopy 

Expert:  5-10 
Non-expert:  >50 

Medium  to  high 

<60 

Low 

Malaria  Rapid 
Diagnostic 

Tests  (mRDTs) 

50-100 

Low 

<15 

Moderate 

PCR/qPCR 

Range:  0.002  to 
<1 

High 

<45  to  >  360 

High 

Table  5.  Summary  of  next-generation  sequencing  technologies. 
Adapted  from  (55;  59). 


Max 

Usable 

Manufacturer 

Platform 

Detection 

Run 

Time 

read 

length 

reads  per 
run 

(bp) 

(millions) 

Life 

Technologies/ 

Ion  Torrent 
PGM 

Proton  release 

2  h 

400 

4 

Applied 

Biosystems 

5500 

SOLiD 

Fluorescence 

detection 

8  d 

-75 

>700 

HiSeq2000 

Fluorescence 

detection 

8.5  d 

-100 

3000 

Illumina 

MiSeq 

Fluorescence 

detection 

27  h 

-150 

7 

454  FLX 

Pyrophosphate 

10  h 

600 

1 

Roche/454  Life 

detection 

Sciences 

454  FLX+ 

Pyrophosphate 

detection 

23  h 

1000 

1 

Pacific 

Biosciences 

RS  II 

Fluorescence 

detection 

0.5-2  h 

50% 

reads  >10 
kb 

0.8 
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Host 


Figure  1 .  Epidemiological  triad.  Adapted  from  (78). 
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Abstract 


Background 

Culex  tritaeniorhynchus  is  the  primary  vector  of  Japanese  encephalitis  vims 
(JEV),  a  leading  cause  of  encephalitis  in  Asia.  JEV  is  transmitted  in  an  enzootic  cycle 
involving  large  wading  birds  as  the  reservoirs  and  swine  as  amplifying  hosts.  The 
development  of  a  JEV  vaccine  reduced  the  number  of  JE  cases  in  regions  with 
comprehensive  childhood  vaccination  programs,  such  as  in  Japan  and  the  Republic  of 
Korea.  However,  the  lack  of  vaccine  programs  or  insufficient  coverage  of  populations  in 
other  endemic  countries  leaves  many  people  susceptible  to  JEV.  The  aim  of  this  study 
was  to  predict  the  distribution  of  Culex  tritaeniorhynchus  using  ecological  niche 
modeling. 

Methods/Principal  Findings 

An  ecological  niche  model  was  constructed  using  the  Maxent  program  to  map  the 
areas  with  suitable  environmental  conditions  for  the  Cx.  tritaeniorhynchus  vector. 
Program  input  consisted  of  environmental  data  (temperature,  elevation,  rainfall)  and 
known  locations  of  vector  presence  resulting  from  an  extensive  literature  search  and 
records  from  MosquitoMap.  The  statistically  significant  Maxent  model  of  the  estimated 
probability  of  Cx.  tritaeniorhynchus  presence  showed  that  the  mean  temperatures  of  the 
wettest  quarter  had  the  greatest  impact  on  the  model.  Further,  the  majority  of  human 
Japanese  encephalitis  (JE)  cases  were  located  in  regions  with  higher  estimated  probability 
of  Cx.  tritaeniorhynchus  presence. 
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Conclusions/Significance 

Our  ecological  niche  model  of  the  estimated  probability  of  Cx.  tritaeniorhynchus 
presence  provides  a  framework  for  better  allocation  of  vector  control  resources, 
particularly  in  locations  where  JEV  vaccinations  are  unavailable.  Furthermore,  this  model 
provides  estimates  of  vector  probability  that  could  improve  vector  surveillance  programs 
and  JE  control  efforts. 

Author  summary 

Japanese  encephalitis  virus  (JEV)  is  transmitted  predominately  by  the  mosquito, 
Culex  tritaeniorhynchus.  The  primary  reservoirs  of  the  virus  are  wading  birds,  with  swine 
serving  as  amplifying  hosts.  Despite  the  development  of  a  JEV  vaccine,  people  remain 
unvaccinated  in  endemic  countries  and  are  susceptible  to  JEV  infection.  The  distribution 
of  the  JEV  vector(s)  provides  essential  information  for  preventive  measures.  This  study 
used  an  ecological  niche  modeling  program  to  predict  the  distribution  of  Cx. 
tritaeniorhynchus  based  on  collection  records  and  environmental  maps  (climate,  land 
cover,  and  elevation).  The  model  showed  that  the  mean  temperatures  of  the  wettest 
quarter  had  the  greatest  impact  on  the  model.  Of  the  25  countries  endemic  for  Japanese 
encephalitis  (JE)  endemic  countries,  seven  possessed  greater  than  50%  land  area  with  an 
estimated  high  probability  of  Cx.  tritaeniorhynchus  presence.  Our  model  provides  a 
useful  tool  for  JEV  surveillance  programs  that  focus  on  vector  control  strategies. 

Introduction 

Japanese  encephalitis  virus  (JEV),  the  causative  agent  of  Japanese  encephalitis 
(JE),  is  an  arbovirus  that  belongs  to  the  family  Flaviviridae  and  is  endemic  to  Southeast 
and  Northeast  Asia,  the  Pacific  Islands,  and  northern  Australia  (Figure  2)  (124).  The 


26 


primary  vector  of  JEV  is  Culex  tritaeniorhynchus  Giles,  but  other  Culex  species  (e.g., 
Culex  annulirostris ,  Culex  vishnui  Theobald,  Culex  bitaeniorhynchus  Giles,  and  Culex 
pipiens  Linnaeus)  have  also  been  implicated  as  important  viral  transmitters  (35;  48;  56; 
241).  The  larval  habitat  of  Cx.  tritaeniorhynchus  is  primarily  low  lying  flooded  areas 
containing  grasses  and  flooded  rice  paddies,  but  this  species  can  also  be  found  in  urban 
environments  in  close  proximity  to  human  populations  (239).  Within  the  past  40  years, 
rice  agriculture  in  JEV  endemic  countries  has  increased  by  20%,  thereby  expanding  Cx. 
tritaeniorhynchus  habitat  and  increasing  human  risk  of  exposure  to  vector  populations 
(141). 

Swine,  including  domestic  and  feral  pigs,  serve  as  amplifying  hosts  of  JEV  in 
endemic  areas.  The  proximity  of  human  populations  to  pig  farms,  sties  or  feral  pig 
populations  increases  the  risks  of  JEV  exposure  (143;  253).  Ardeid  birds  (large  wading 
birds)  are  an  important  JEV  reservoir  and  can  spread  JEV  to  new  regions  through  their 
northern  migration  to  breeding  and  feeding  grounds  in  the  spring  and  southern  return  in 
the  fall  (56).  Additional  animals  have  been  identified  as  host  species  for  JEV,  including 
domesticated  animals  (chickens,  goats,  cows,  and  dogs),  as  well  as  bats,  flying  foxes, 
ducks,  snakes  and  frogs.  However,  these  are  considered  dead-end  hosts  as  they 
infrequently  develop  sufficient  viremias  to  infect  mosquito  vectors  (228;  299;  307;  312). 

Despite  the  introduction  of  an  effective  vaccine  to  the  public  in  the  mid- 1900s, 
JEV  remains  the  leading  cause  of  viral  encephalitis  globally  (303).  Comprehensive 
vaccination  programs  in  Japan,  Republic  of  Korea  (ROK),  Brunei,  Australia,  and 
Malaysia  have  significantly  reduced  the  number  of  human  cases  (222).  Rare  occurrences 
of  neurological  complications  associated  with  the  mouse-brain  derived  JEV  vaccine 
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interrupted  vaccination  programs  in  some  regions,  initiating  concerns  of  the  reemergence 
of  JEV  in  an  unvaccinated  and  non-immune  population  (150;  234).  The  prevalence  of  JE 
is  higher  in  countries  with  lower  socioeconomic  status,  when  compared  to  more  affluent 
neighboring  countries,  indicating  the  importance  of  economic  and  social  stability  as 
additional  risk  factors  that  impact  the  transmission  and  prevalence  of  JE  in  non-immune 
populations  (222). 

Recent  developments  in  the  field  of  ecological  niche  modeling  and  the 
development  of  global  environmental  data  sets  have  resulted  in  the  ability  to  predict  the 
distribution  of  vector  populations  that  directly  relate  to  transmission  of  viruses,  parasites, 
and  fungal  pathogens  and  impact  on  animal  and  human  health.  Modeling  to  estimate  the 
distribution  of  disease  vectors  provides  useful  information  in  disease-endemic  areas,  in 
addition  to  predicting  how  anthropogenic  changes  to  the  environment  will  affect  disease 
presence  (109;  175;  181;  182). 

In  the  current  study,  the  Maxent  ecological  niche  modeling  program  was  utilized 
to  model  the  distribution  of  the  primary  vector  of  JEV,  Cx.  tritaeniorhynchus  (230).  The 
resulting  vector  habitat  suitability  map  was  compared  to  the  reported  locations  of  JE 
human  cases  and  the  current  status  of  established  JE  vaccination  programs  by  country. 
Our  ecological  niche  model  can  be  used  by  public  health  officials  and  government 
agencies  in  endemic  regions  to  guide  implementation  of  comprehensive  vaccination 
programs,  vector  control  strategies,  and  public  health  awareness  campaigns. 
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Methods 


Culex  tritaeniorhynchus  Data  Collection 

Geographical  coordinates  of  known  Cx.  tritaeniorhynchus  records  were  identified 
by  performing  a  literature  search  in  PubMed  for  all  previous  field  collection  studies. 
When  exact  geographical  coordinates  were  not  provided,  locations  were  approximated  by 
searching  for  the  given  city,  town,  or  village  using  Google  Earth  software.  Further 
geographical  data  points  for  the  distribution  of  Cx.  tritaeniorhynchus  were  obtained 
through  MosquitoMap  (http://www.mosquitomap.org/),  a  database  of  spatial  data  points 
of  mosquitoes  that  is  maintained  by  the  Walter  Reed  Biosystematics  Unit,  Smithsonian 
Support  Center,  Silver  Hill,  MD.  Additional  Cx.  tritaeniorhynchus  collection  data  were 
obtained  from  Force  Health  Protection  and  Preventive  Medicine,  65th  Medical  Brigade, 
Yongsan  Army  Garrison,  ROK. 

In  previous  modeling  work  in  the  ROK  (181),  we  found  that  a  large  number  of 
collection  records  in  a  limited  geographical  area  biased  the  model.  As  a  result  of  the  large 
number  of  collection  sites  for  the  ROK  (96  unique  locations),  we  reduced  the  number  of 
records  to  23  by  deleting  all  but  one  randomly  selected  record  per  administrative  district. 

Identification  of  Japanese  Encephalitis  (JE)  Human  Cases 

Approximate  locations  of  known  human  JE  cases  were  determined  using  locations 
provided  in  ProMED  mail  reports  (www.promedmail.org)  from  1994  through  2010 
(Figure  3).  Additional  locations  of  confirmed  JE  cases  were  also  determined  through  a 
PubMed  literature  search.  Exact  geographical  coordinates  were  not  reported  for  most 
documented  human  cases  and  were  therefore  extrapolated  using  the  Google  Earth 
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software  to  obtain  the  latitude/longitude  coordinates  of  the  reported  city,  town,  or  village 
in  which  JE  was  documented. 

Vaccination  Programs  in  JEV  Endemic  Countries 

JEV  vaccination  program  information  was  obtained  from  “Japanese  Encephalitis 
Morbidity,  Mortality  and  Disability:  Reduction  and  Control  by  2015”  published  in  2009 
by  the  Program  for  Appropriate  Technology  in  Health  (PATH),  Armed  Forces  Research 
Institute  of  Medical  Sciences,  and  BIKEN  (222).  Additionally,  JEV  vaccination  programs 
information  was  also  obtained  from  the  WHO/IVB  database  (216).  Countries  lacking  a 
JEV  vaccination  program  were  identified  using  information  in  the  above  mentioned 
publications  and  confirmed  with  additional  literature  searches.  A  summary  of  these  data 
are  listed  in  Table  6. 

Environmental  Data 

One  kilometer  resolution  climate  and  elevation  data  were  obtained  from 
WorldClim  (http://www.worldclim.org/bioclim).  The  WorldClim  organization  has 
processed  50  years  of  ground-based  weather  measurements  to  produce  mean  monthly 
minimum  and  maximum  temperatures  and  precipitation  in  a  grid  format  at  several 
different  resolutions.  The  data  were  further  processed  to  produce  bioclimatic  variables 
(e.g.,  mean  temperatures  of  the  wettest  quarters).  For  this  project,  the  highest  resolution 
data  available  from  WorldClim  (approximately  1  km)  were  downloaded.  In  addition  to 
bioclimatic  variables,  global  elevation  data  obtained  from  WorldClim  was  re-sampled  to 
1-km  resolution  from  NASA's  Shuttle  Radar  Topography  Mission  (SRTM).  Descriptions 
of  the  bioclimatic  and  elevation  variables  used  for  this  study  are  listed  in  Table  7.  To 
better  understand  the  effect  of  each  environmental  variable  on  Cx.  tritaeniorhynchus 
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distribution,  the  values  of  each  environmental  layer  at  each  site  were  extracted  using 
ArcGIS  (ESRI,  Redlands,  California,  www.esri.com).  This  allowed  for  a  comparison  to 
the  known  environmental  and  distribution  limitations  for  Cx.  tritaeniorhynchus  in  the 
literature. 

A  map  of  rice  growing  areas  was  created  by  processing  GeoCover-LC  (Land 
Cover)  data  from  MDA  Information  Systems,  Inc. 

(http://www.mdafederal.com/geocover/geoco  verlc).  GeoCover  was  created  by 
processing  Landsat  Thematic  Mapper  images  to  create  land  cover  maps  for  most  areas  of 
the  world.  Each  pixel  within  the  GeoCover-LC  represents  30  by  30  meters.  To  convert 
the  image  to  a  resolution  that  could  be  used  in  the  Maxent  model,  ArcGIS  was  used  to 
count  the  number  of  rice  pixels  within  each  square  kilometer  (33  by  33  pixels).  Then,  a 
rice  percentage  was  calculated  for  each  square  kilometer  (number  of  rice  pixels  divided 
by  total  number  of  pixels  in  1  km)  and  stored  in  a  final  output  image. 

Ecological  Niche  Model 

The  Maxent  3.2.1  modeling  program  (http://www.cs.princeton.edu/~schapire/ma 
xent/)  was  utilized  to  model  the  distribution  of  Cx.  tritaeniorhynchus  based  on  previously 
obtained  geographical  locations.  Maxent  utilizes  a  maximum  entropy  algorithm  to 
analyze  values  of  environmental  layers,  such  as  temperature,  precipitation,  and  elevation, 
at  known  locations  of  species  occurrence  (collection  records)  to  estimate  the  probable 
range  of  the  species  over  a  geographic  region  (230;  231).  This  model  is  based  on 
presence-only  data  instead  of  presence/absence  data  due  to  the  lack  of  available  absence 
data.  Although  absence  data  can  be  informative  for  modeling,  ecological  niche  models 
based  on  presence-only  data  are  useful  in  regions  with  limited  collection  data  (230). 
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Without  absence  data,  the  true  probability  of  presence  cannot  be  modeled.  In  Maxent, 
which  uses  presence  only  data,  the  species  distribution  is  output  as  an  estimated 
probability  map  (163). 

The  Maxent  program  calculates  the  importance  of  environmental  variables  in 
developing  predictive  species  distribution  models  by  using  the  jackknife  test  of  variable 
importance.  The  jackknife  test  runs  the  model  1)  once  with  all  variables,  2)  dropping  out 
each  variable  in  turn,  and  3)  with  a  single  variable  at  a  time.  Variables  are  considered 
import  if  they  produce  high  training  gains  when  used  alone  in  a  model.  A  variable  is  also 
important  if  the  training  gain  is  low  when  the  variable  is  removed  from  the  model  (230). 

Maxent  utilizes  two  approaches  to  validate  the  accuracy  of  the  model.  The  first 
method  randomly  selects  occurrence  points  to  be  withheld  from  the  model  building  to  use 
as  testing  points.  Using  multiple  definitions,  a  set  of  thresholds  split  the  continuous 
probability  values  of  the  model  into  ‘predicted  presence  or  absence’  categories.  Maxent 
then  calculates  the  p-value  based  on  the  null  hypothesis  that  testing  points  will  be 
predicted  as  “present”  no  better  than  by  a  random  model.  The  second  method  calculates 
the  Area  Under  the  Curve  (AUC)  of  the  receiver  operator  characteristic  (ROC),  a 
graphical  depiction  of  the  sensitivity  versus  one  (1)  minus  the  specificity  of  the  model 
often  used  to  validate  ecological  niche  models  (230;  278).  The  AUC  indicates  whether 
the  model  predicts  species  location  better  than  a  random  distribution.  AUC  values  of  <0.5 
indicates  a  random  distribution  and  AUC  values  >0.9  indicates  high  reliability  of  the 
model  (230).  To  determine  the  best  combination  of  environmental  data  for  modeling,  the 
model  was  run  four  times  using  different  sets  of  input  layers  each  time:  1)  bioclimatic 
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layers,  elevation  and  rice  crop  data,  2)  bioclimatic  layers  and  elevation  data,  3) 
bioclimatic  layers  only,  and  4)  elevation  data  only. 

Results 

Ecological  Niche  Model  of  Cx.  tritaeniorhynchus 

A  total  of  139  unique  sites  of  documented  Cx.  tritaeniorhynchus  geographical 
locations  were  utilized  to  construct  the  ecological  niche  model  (Figure  4).  Of  the  139 
total  points,  105  (76%)  were  randomly  designated  as  training  points  in  order  to  build  the 
model  and  34  (24%)  points  were  used  to  test  the  model.  The  model  was  run  four  times 
using  different  combinations  of  environmental  layers  (Table  8).  Statistical  results  indicate 
that  the  most  accurate  model  included  bioclimatic  layers  and  elevation  (Table  8),  and 
therefore  this  model  was  used  in  all  subsequent  analyses.  Statistical  evaluation  showed 
the  model  to  have  a  high  accuracy,  with  the  AUC>0.9  and  low  p- values.  The  model  is 
available  to  view  or  download  from  www.vectormap.org. 

In  order  to  evaluate  the  contribution  of  each  environmental  variable  to  the  model, 
Maxent  utilizes  a  jackknife  test,  which  indicated  that  the  annual  precipitation  (bio  12) 
environmental  layer  is  the  environmental  variable  with  the  highest  gain  when  used  in  the 
model  by  itself.  The  Maxent  program  also  calculates  a  percent  contribution  for  each 
variable  in  the  model.  The  annual  precipitation  variable  contributed  16.2%  of  the 
information  used  by  the  model,  another  indication  that  it  is  an  important  environmental 
factor  for  estimating  the  distribution  of  Cx.  tritaeniorhynchus  (Table  7).  The  mean 
temperature  of  the  wettest  quarter  variable  (bio08)  contributed  the  highest  percentage 
(21 .7%)  of  the  information  to  the  model.  Elevation  was  also  an  important  variable, 
contributing  9.6%  to  the  model.  From  the  jackknife  test,  if  elevation  data  were  removed 
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from  the  model,  the  overall  training  gain  would  decrease  the  most,  indicating  the 
elevation  variable  contained  the  most  unique  information  of  the  variables  in  the  Cx. 
tritaeniorhynchus  distribution  model. 

The  values  of  each  environmental  variable  at  each  recorded  location  of 
occurrence  were  extracted  using  ArcGIS  (Table  7).  For  example,  the  known  locations  for 
Cx.  tritaeniorhynchus  used  in  the  model  fell  within  0  and  838  meters  of  elevation.  This  is 
consistent  with  the  published  reports  that  Cx.  tritaeniorhynchus  is  rarely  collected  above 
1,000  meters  (220;  225). 

Human  JE  Cases  and  Vector  Presence  Estimation 

Ninety-six  reported  JE  case  locations  were  identified  in  endemic  regions  (Figure 
3).  ArcGIS  analysis  categorized  human  JE  cases  based  on  the  estimated  probability  of 
Cx.  tritaeniorhynchus  presence  (Figure  5).  Human  JE  cases  were  identified  at  locations 
with  a  range  of  estimated  probability  of  vector  presence,  including  regions  with  25%  or 
less  estimated  probability.  However,  the  majority  (>75%)  of  human  JE  cases  were 
reported  from  regions  with  greater  than  25%  estimated  probability  of  Cx. 
tritaeniorhynchus  presence.  Limited  availability  of  location  data  of  human  JE  cases 
greatly  impacts  any  associations  between  areas  of  high  estimated  vector  probability  and 
disease.  For  instance,  the  lack  of  human  JE  cases  in  other  regions  of  estimated  high 
probability  of  vector  presence  could  be  due  to  lack  of  reporting,  improper  diagnosis,  or 
due  to  successful  prevention  strategies. 

Cx.  tritaeniorhynchus  Presence  Estimation  Per  Country 

ArcGIS  analysis  determined  the  approximate  percentage  of  each  country  with 
>25%  probability  of  Cx.  tritaeniorhynchus  presence  based  on  the  Maxent  model  (Table 
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6).  Of  the  25  endemic  countries,  seven  possessed  >50%  of  their  land  area  with  a  higher 
probability  of  Cx.  tritaeniorhynchus  presence.  Three  countries  (Bhutan,  Pakistan,  and 
Russia)  possessed  <1%  of  their  total  country  area  with  a  25%  probability  of  Cx. 
tritaeniorhynchus  presence. 

Discussion 

In  this  study,  a  statistically  significant  ecological  niche  model  for  Cx. 
tritaeniorhynchus  was  developed  using  mosquito  presence  records,  climate,  and  elevation 
variables.  Locations  of  human  cases  of  JE  generally  fell  within  the  higher  probability 
areas  of  Cx.  tritaeniorhynchus  (Figure  5).  Regions  of  estimated  high  probability  of  Cx. 
tritaeniorhynchus  presence  (Figure  4)  are  representative  of  preferred  environments,  based 
on  temperature,  precipitation  and  elevation  where  Cx.  tritaeniorhynchus  habitats  occur. 
This  model  serves  as  a  tool  to  fill  in  knowledge  gaps  regarding  Cx.  tritaeniorhynchus  and 
can  be  utilized  by  health  care  professionals  and  policy  officials  in  endemic  regions  to 
help  guide  the  development  and  implementation  of  disease  mitigating  strategies  in 
endemic  regions. 

The  Maxent  program  identifies  important  environmental  variables  that  are  major 
contributors  to  the  vector  distribution  model.  Based  on  the  jackknife  test  of  variable 
importance,  the  annual  precipitation  (bio  12)  is  an  important  contributor  to  the  model. 
Additionally,  the  mean  precipitation  of  the  wettest  quarters  (bio08)  and  elevation  also 
contributed  greatly  to  the  model  for  distribution  of  Cx.  tritaeniorhynchus  (Table  7). 
Previous  studies  that  aimed  to  identify  favorable  ecological  conditions  of  mosquitoes 
found  that  the  optimal  temperature  of  JEV  vectors  is  between  22.8  and  34.5°C  (200).  The 
importance  of  temperature  during  the  wet  season  in  the  model  is  attributed  to 
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temperatures  and  flooded  habitats  that  are  optimal  for  larval  development  and  adult 
survival.  Locations  in  which  the  temperatures  do  not  fall  into  the  optimal  range  during 
the  rainy  season  may  therefore  experience  fewer  mosquitoes,  despite  harboring  the 
appropriate  habitat.  Temperature  also  plays  a  role  in  disease  transmission  rates,  as  higher 
temperatures  increase  the  rates  of  virus  replication  and  dissemination,  while  decreasing 
the  time  from  mosquito  infection  to  transmission  of  the  virus  to  animal  and  human  hosts 
(280). 

Sampling  bias  is  an  issue  that  affects  the  accuracy  of  the  model  as  the  model  was 
developed  using  existing  data  from  the  literature  and  VectorMap.  Therefore,  some 
regions  have  not  been  sampled  in  the  study  area  and  some  have  been  oversampled.  Cx. 
tritaeniorhynchus  data  for  China  were  very  limited  (Figure  3),  which  may  mean  that 
some  potential  environmental  conditions  of  Cx.  tritaeniorhynchus  were  not  represented  in 
the  model,  in  particular,  the  cooler  Northeast  region  of  China.  Because  modeling  was 
limited  to  Cx.  tritaeniorhynchus,  there  is  a  potential  that  for  some  regions,  other  primary 
or  secondary  vectors,  i.e.,  Cx.  annulirostris ,  Cx.  bitaeniorhynchus ,  and  Cx.  vishnui,  may 
predominate  and  maintain  transmission  of  JEV  in  these  areas.  Collection  records  of  Cx. 
tritaeniorhynchus  were  obtained  spanning  many  decades  and  at  different  times  during  the 
year,  furthering  the  impact  of  sampling  bias  on  our  model.  In  addition,  the  density  of  Cx. 
tritaeniorhynchus  was  not  collected  in  this  study  and  is  an  important  limitation  as  vector 
abundance  plays  a  crucial  role  in  disease  transmission.  Further  collection  studies  are 
therefore  needed  to  determine  the  abundance  of  vector  species  in  addition  to  presence  in 
endemic  regions. 
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Low-lying  flooded  areas  containing  grasses,  including  rice  paddies,  are  the 
primary  larval  habitats  for  Cx.  tritaeniorhynchus .  An  increase  in  the  amount  of  flooded 
rice  field  habitat  has  shown  to  be  positively  correlated  with  increases  in  adult  populations 
of  Cx.  tritaeniorhynchus  in  the  ROK  (240).  Although  the  rice  map  derived  from  the 
GeoCover  Land  Cover  map  (Figure  6)  does  generally  match  the  predicted  occurrence  of 
Cx.  tritaeniorhynchus,  there  are  some  areas  where  the  model  predicts  the  presence  of  the 
mosquito,  yet  no  rice  crops  were  mapped.  For  some  areas,  rice  may  not  have  been 
identified  correctly  on  the  satellite  images,  since  agricultural  areas  were  limited  or  were 
adjacent  to  other  predominant  habitats.  For  example,  rice  is  produced  in  Nepal  (133),  but 
no  rice  fields  were  identified  by  GeoCover  in  Nepal,  since  the  identification  of  small  rice 
fields  in  mountainous  areas  on  satellite  images  can  be  difficult.  Alternatively,  this  shows 
that  environments  other  than  rice  fields  are  suitable  habitat  for  Cx.  tritaeniorhynchus. 

The  predicted  probability  of  Cx.  tritaeniorhynchus  presence  values  were  used  to 
determine  the  percentage  of  a  country  at  high  risk  (greater  than  25%  probability)  for 
vector  presence  (Table  6).  Many  Asian  countries  have  high  percentages  of  their  total  land 
area  with  a  >25%  probability  for  the  presence  of  Cx.  tritaeniorhynchus.  Cambodia,  the 
ROK,  Sri  Lanka,  and  Thailand  have  over  75%  of  their  land  area  with  a  >25%  probability 
of  Cx.  tritaeniorhynchus  presence.  Countries  demonstrating  >50%  of  their  total  land  area 
and  with  Cx.  tritaeniorhynchus  occurrence  >25%  probability  includes:  Bangladesh,  East 
Timor,  and  Vietnam.  However,  some  countries  may  have  small  areas  of  vector  habitat 
close  to  large  populations  that  can  result  in  outbreaks  despite  low  percentage  estimated 
probability  in  the  region  overall.  Further,  this  analysis  does  not  take  into  account  country 
size  or  vector  abundance,  which  would  also  impact  disease  transmission.  Although 
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additional  factors  contribute  to  JE  disease  risks,  the  distribution  of  the  vector  populations 
within  a  country  is  a  valuable  data  set  when  considering  the  necessity  of  vaccination  and 
other  health  risk  reduction  programs. 

Human  JE  cases  were  categorized  based  on  the  estimated  probability  of  vector 
presence  at  the  reported  location  (Figure  5).  Interestingly,  a  portion  of  human  cases  were 
reported  from  regions  with  25%  or  less  estimated  probability  of  Cx.  tritaeniorhynchus 
presence.  JE  cases  not  falling  within  high  probability  pixels  could  have  been  acquired  in 
nearby  locations.  Even  for  precisely  located  case  data,  the  high  resolution  of  the  model 
(one  kilometer  pixels)  increases  the  likelihood  that  the  predicted  Cx.  tritaeniorhynchus 
location  does  not  match  disease  acquisition  location  as  many  people  travel  more  than  a 
kilometer  in  the  course  of  a  typical  day.  Alternatively,  additional  factors  other  than  the 
presence  of  Cx.  tritaeniorhynchus  may  be  important  when  determining  the  risk  of 
disease.  For  instance,  other  JEV  vectors  may  dominant  in  these  regions  estimated  with 
low  Cx.  tritaeniorhynchus  presence.  JE  is  one  of  many  febrile  illnesses  that  affect  human 
populations  in  Asia.  Difficulties  arise  in  diagnosis  of  JE  in  patients  based  on  symptoms 
alone  that  range  from  mild  to  very  severe,  with  laboratory  tests  required  for  confirmation. 
Obtaining  geographical  data  of  where  human  cases  were  acquired  is  made  difficult  due  to 
lack  of  confident  diagnoses,  patient  travel  history,  and  spatial  data.  The  lack  of  precision 
of  the  reported  case  locations  may  also  contribute  to  lower  numbers  of  JE  cases  falling 
within  high  probability  Cx.  tritaeniorhynchus  pixels.  Identification  of  human  JE  cases  in 
this  study  is  extremely  limited  and  does  not  represent  all  human  cases  in  JE  endemic 
regions.  In  many  cases,  only  a  village  or  city  name  was  given  for  reported  cases.  A 
previous  study  to  model  the  distribution  of  Cx.  tritaeniorhynchus  to  predict  JE  in  the 
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Republic  of  Korea  found  human  cases  to  occur  in  areas  of  high  estimated  probability  of 
vector  presence  (181).  This  study,  however,  utilized  intensive  vector  collection  methods 
and  JE  case  data  were  obtained  from  the  Korea  Centers  for  Disease  Control  resulting  in 
an  overall  more  extensive  and  accurate  model.  This  illustrates  the  need  for  increased 
surveillance  of  vector  and  human  JE  cases  in  order  to  generate  more  accurate  risk  models 
for  JE.  In  order  to  evaluate  the  impact  of  vector  presence  on  the  risk  of  JE  in  humans, 
comprehensive  efforts  to  identify  specific  locations  of  both  symptomatic  and 
asymptomatic  JE  cases  across  endemic  regions  are  needed. 

Ecological  niche  modeling  inherently  possesses  limitations  in  that  it  makes 
predictions  based  solely  on  environmental  variables  that  impact  larval  development  and 
adult  survival.  Other  important  factors  that  influence  vector  distributions  include:  vector 
control  strategies,  public  health  campaigns,  socioeconomic  status,  human  population 
densities,  anthropogenic  changes  to  land  (creation  of  vector  habitat),  vector  species 
competition,  and  predator  influences  on  their  potential  distribution  and  population 
densities.  Further,  the  use  of  WorldClim  data  may  underestimate  or  ignore  environmental 
variables  that  occur  during  a  short  time  period  or  transient  habitat  suitable  for  the  vector 
to  survive.  Incorporation  of  these  variables  will  undoubtedly  increase  the  validity  of  the 
model.  These  factors  are  also  important  to  take  into  consideration  when  implementing 
mosquito  control  initiatives  and  vaccination  campaigns. 

The  reemergence  of  JEV  remains  possible  due  to  multiple  factors.  Increases  in  the 
pig  farming  industry,  modification  and  expansion  of  arable  lands  for  wetland  rice 
farming,  and  a  fraction  of  the  population  unvaccinated/non-immune,  in  combination  with 
optimal  climatic  conditions,  contribute  to  the  potential  for  periodic  outbreaks  of  JE  as  the 
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one  observed  in  the  ROK  (143).  Genotype  analyses  of  circulating  JEV  strains  identified 
the  reemergence  of  genotype  V,  which  was  unseen  in  Asia  for  over  50  years  (162;  283). 
The  identification  of  emerging/reemerging  JEV  strains  is  important  for  vaccine 
development  and  the  implementation  of  effective  vaccination  programs.  Increased 
surveillance  in  areas  with  known  vector  populations  and  additional  risk  factors,  such  as 
reservoir  and  amplifying  hosts,  will  aid  in  the  identification  of  circulating  JEV  strains  as 
well  as  strains  that  are  emerging  in  novel  human  and  vector  populations.  Understanding 
the  vector  distribution  is  a  key  step  to  effectively  understanding  JEV  risks  and  also  to 
preventing  additional  outbreaks  of  JE  in  endemic  countries. 
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Table  6.  Summary  of  JEV  vaccination  programs  in  endemic  countries  and  predicted 
percentage  of  land  with  greater  than  25%  estimated  probability  of  Cx. 
tritaeniorhynchus  presence  based  on  the  ecological  niche  model. 


JEV  Endemic 
Countries 

JE  vaccination  program  status1 

Percentage  of  area 
>25%  estimated 
probability  of  Cx. 
tritaeniorhynchus 
presence 

Australia 

Administered  in  endemic  areas 

4.6 

Bangladesh 

No  Current  Immunization  Program 

56.8 

Bhutan 

No  Current  Immunization  Program 

0  ■ 

Brunei 

No  Current  Immunization  Program 

16.3 

Cambodia 

No  Current  Immunization  Program 

79.4  ■ 

China 

National  Vaccination  Program  2010 

3.6 

East  Timor 

No  Current  Immunization  Program 

68.4 

India 

Vaccine  administered  in  high  risk  areas, 
not  integrated  into  routine 
immunization  program 

19.2 

Indonesia 

No  Current  Immunization  Program 

14 

Japan 

National  Vaccination  Program  2010 

42.4 

Laos 

No  Current  Immunization  Program 

19.7 

Malaysia 

Regional  vaccination 

8.4 

Myanmar 

No  Current  Immunization  Program 

20.5 

Nepal 

Vaccine  introduced  in  2006,  not  widely 
implemented 

2.8 

Democratic 
People’s 
Republic  of 
Korea 

DPRK  originated  vaccine  provided  in 
high  risk  areas 

21.1 

Pakistan 

No  Current  Immunization  Program 

0.06 

Papua  New 
Guinea 

No  Current  Immunization  Program 

11.1 

Philippines 

No  wide  scale  vaccination  program  in 
place,  vaccine  trial  in  progress 
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Republic  of 
Korea 

Government  mandated  mass 
immunization  began  in  1971 

78.8 

Russia 

No  data 

0.01 

Singapore 

No  data 

10.7 

Sri  Lanka 

18  of  26  districts  receive  vaccine 
annually,  plans  to  extend  to  all  districts 

85.2 

Taiwan 

National  Vaccination  Program 

34.2 
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Thailand 

National  Vaccination  Program  2010 

80.9 

Viet  Nam 

Vaccine  distributed  in  high  risk  districts 

61 

Western  Pacific 
(Guam,  Saipan) 

No  data 

No  data 

1  Obtained  from  PATH:  Japanese  Encephalitis  Morbidity,  Mortality,  and  Disability: 
Reduction  and  Control  by  2015  (222)  and  WHO/IVB  database,  193  WHO  Member 
States.  Data  as  of  September  2011  (216). 
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Table  7.  Minimum,  maximum,  mean  values  and  percent  contribution  of  environmental 
data  layers  for  the  Cx.  tritaeniorhynchus  model. 


Variable 

Description 

Min 

Max 

Mean 

Percent 

Contribution 

Alt 

Altitude  (elevation  above  sea 
level),  m 

0 

838 

153.4 

9.6 

BioOl 

Annual  mean  temperature,  °C 

8.2 

28.9 

23.3 

4.4 

Bio02 

Mean  diurnal  range  (Mean  of 
monthly  (max  temp-min 

temp)),°C 

4.9 

15 

9.4 

4.7 

Bio03 

Isothermality 

[(Bio2/Bio7)*100],°C 

2.2 

9 

5.1 

1.8 

Bio04 

Temperature  Seasonality 
(standard  deviation  *  100),  °C 

30.1 

1031.3 

366 

3.3 

Bio05 

Max  temperature  of  the 
warmest  month,  °C 

25.8 

42.5 

33.4 

1.2 

Bio06 

Min  temperature  of  the  coldest 
month,  °C 

-12.5 

24.4 

12.4 

3.3 

Bio07 

Temperature  annual  range 
(Bio5-Bio6),°C 

7.2 

40.7 

21 

3.2 

Bio08 

Mean  temperature  of  the 
wettest  quarter1,  °C 

16.9 

32.8 

26.2 

21.7 

Bio09 

Mean  temperature  of  the  driest 
quarter,  °C 

-4.9 

28.7 

19.5 

0.3 

BiolO 

Mean  temperature  of  the 
warmest  quarter,  °C 

20.1 

34.3 

27.7 

0.6 

Bioll 

Mean  temperature  of  the 
coldest  quarter,  °C 

-4.9 

27.2 

18.3 

1 

Bio  12 

Annual  precipitation,  mm 

152 

4005 

1610.3 

16.2 

Bio  13 

Precipitation  of  the  wettest 
month,  mm 

41 

1011 

319.4 

5.9 

Bio  14 

Precipitation  of  the  driest 
month,  mm 

0 

233 

30.8 

6 
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Bio  15 

Precipitation  seasonality 
(coefficient  of  variation),  mm 

18 

138 

74.5 

5.8 

Bio  16 

Precipitation  of  the  wettest 
quarter,  mm 

95 

2455 

797.4 

1 

Bio  17 

Precipitation  of  the  driest 
quarter,  mm 

0 

786 

114.9 

0.7 

Bio  18 

Precipitation  of  the  warmest 
quarter,  mm 

62 

1015 

467.2 

0.7 

Bio  19 

Precipitation  of  the  coldest 
quarter,  mm 

11 

1812 

260.7 

8.5 

'A  quarter  is  a  period  of  three  months 


Table  8.  Maxent  model  accuracy  analysis  using  different  sets  of  environmental  data 
inputs. 


Environmental  Data  Input 

AUC  Training 
Points 

AUC  Test 
Points 

P-value  Minimum 
Training  Presence 

Bioclimatic  and  Elevation  data 

0.971 

0.932 

<0.0001 

All  layers  (including  rice  crop) 

0.968 

0.919 

<0.0001 

Bioclimatic  data  only 

0.968 

0.929 

<0.0001 

Elevation  data  only 

0.822 

0.849 

<0.0001 
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Figure  2.  Japanese  encephalitis  virus  endemic  area. 
Map  adapted  from  CDC. 


Figure  3.  Distribution  of  known  Cx.  tritaeniorhynchus  locations  and  documented  human 
cases  of  JE  within  endemic  region. 
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Estimated  Probability 
of  Cx.  tritaeniorhynchus 
Presence 


500  1,000  2,000  Kilometers 


Figure  4.  Maxent  model  estimation  of  the  probability  of  Cx.  tritaeniorhynchus 
distribution  in  the  JE  endemic  region. 

Darker  areas  indicate  areas  that  are  likely  to  have  suitable  habitat  for  this  vector 
species  while  lighter  areas  indicate  areas  of  that  are  less  suitable  for  the  vector. 
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Figure  5.  Human  JE  cases  categorized  by  color  based  on  the  estimated  probability  of  Cx. 
tritaeniorhynchus  presence. 
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2,000  Kilometers 


Figure  6.  Percent  of  30  meter  pixels  classified  as  rice  land  cover  within  1  one  square 
kilometer  derived  from  the  GeoCover  Land  Cover  product. 

Gray  areas  indicate  no  data. 
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CHAPTER  3:  Characterization  of  Plasmodium  ovale  curtisi  and  P.  ovale 
wallikeri  in  Western  Kenya  Utilizing  a  Novel  Species-specific  Real-time 
PCR  Assay 
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Abstract 


Background 

Plasmodium  ovale  is  comprised  of  two  genetically  distinct  subspecies,  P.  ovale 
curtisi  and  P.  ovale  wallikeri.  Although  P.  ovale  subspecies  are  similar  based  on 
morphology  and  geographical  distribution,  allelic  differences  indicate  that  P.  ovale  curtisi 
and  P.  ovale  wallikeri  are  genetically  divergent.  Additionally,  potential  clinical  and 
latency  duration  differences  between  P.  ovale  curtisi  and  P.  ovale  wallikeri  demonstrate 
the  need  for  investigation  into  the  contribution  of  this  neglected  malaria  parasite  to  the 
global  malaria  burden. 

Methods 

In  order  to  detect  all  P.  ovale  subspecies  simultaneously,  we  developed  an 
inclusive  P.  ovale- specific  real-time  PCR  assay  based  on  conserved  regions  between  P. 
ovale  curtisi  and  P.  ovale  wallikeri  in  the  reticulocyte  binding  protein  2  ( rbp2 )  gene. 
Additionally,  we  characterized  the  P.  ovale  subspecies  prevalence  from  22  asymptomatic 
malaria  infections  using  multilocus  genotyping  to  discriminate  P.  ovale  curtisi  and  P. 
ovale  wallikeri. 

Results 

Our  P.  ovale  rbp2  qPCR  assay  validation  experiments  demonstrated  a  linear 
dynamic  range  from  6.25  rbp2  plasmid  copies/microliter  to  100,000  rbp2  plasmid 
copies/microliter  and  a  limit  of  detection  of  1 .5  rbp2  plasmid  copies/microliter. 

Specificity  experiments  showed  the  ability  of  the  rbp2  qPCR  assay  to  detect  low-levels  of 
P.  ovale  in  the  presence  of  additional  malaria  parasite  species,  including  P.  falciparum,  P. 
vivax,  and  P.  malariae.  We  identified  P.  ovale  curtisi  and  P.  ovale  wallikeri  in  Western 
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Kenya  by  DNA  sequencing  of  the  tryptophan-rich  antigen  gene,  the  small  subunit 
ribosomal  RNA  gene,  and  the  rbp2  gene. 

Conclusions 

Our  novel  P.  ovale  rbp2  qPCR  assay  detects  P.  ovale  curtisi  and  P.  ovale 
wallikeri  simultaneously  and  can  be  utilized  to  characterize  the  prevalence,  distribution, 
and  burden  of  P.  ovale  in  malaria  endemic  regions.  Using  multilocus  genotyping,  we  also 
provided  the  first  description  of  the  prevalence  of  P.  ovale  curtisi  and  P.  ovale  wallikeri 
in  Western  Kenya,  a  region  holoendemic  for  malaria  transmission. 

Author  Summary 

Humans  can  be  infected  with  five  malaria  parasite  species:  Plasmodium 
falciparum,  P.  vivax,  P.  malariae,  P.  knowlesi,  and  P.  ovale.  Although  the  vast  majority 
of  malaria  morbidity  and  mortality  worldwide  can  be  attributed  to  P  .falciparum,  non¬ 
falciparum  malaria  parasites  can  also  cause  clinical  disease.  Researchers  use  nucleic  acid 
based  detection  methods,  such  a  polymerase  chain  reaction  (PCR),  to  detect  low-density 
malaria  parasitemias  that  can  evade  microscopic  detection.  P.  ovale  was  recently 
identified  to  exist  as  two  subspecies,  P.  ovale  curtisi  and  P.  ovale  wallikeri,  that  look 
identical  but  differ  genetically.  In  this  study,  we  developed  a  novel  real-time  PCR 
(qPCR)  assay  to  detect  all  P.  ovale  parasites,  based  on  a  conserved  gene  between  P.  ovale 
curtisi  and  P.  ovale  wallikeri.  We  also  used  DNA  sequencing  to  differentiate  between  P. 
ovale  curtisi  and  P.  ovale  wallikeri  from  a  small  sample  of  P.  ovale  asymptomatic 
infections  in  Western  Kenya.  Through  the  use  of  our  novel  rbp2  qPCR  assay,  we  aim  to 
characterize  the  prevalence  of  P.  ovale  in  future  epidemiological  studies  in  order  to  better 
understand  this  neglected  malaria  parasite  species. 
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Introduction 


Plasmodium  ovale,  the  causative  agent  of  benign  tertian  malaria,  was  identified  as 
a  distinct  malaria  parasite  species  in  1922  based  on  its  characteristic  oval  morphology  in 
infected  erythrocytes  (273).  P.  ovale  rarely  causes  severe  disease  in  humans  living  in 
malaria  endemic  regions,  but  can  cause  serious  clinical  disease  in  naive  travelers  (64;  74; 
157;  165;  185;  249;  256;  274).  The  actual  prevalence  and  clinical  relevance  of  P.  ovale  is 
likely  underestimated  for  the  following  reasons.  First,  P.  ovale  is  often  found  as  a  mixed 
infection  with  other  malaria  parasite  species  (46;  79;  198).  This  can  confound 
microscopic  identification  of  P.  ovale  due  to  difficulties  in  differentiating  P.  ovale  from 
other  morphologically  similar  malaria  parasites,  such  as  P.  vivax.  Second,  the 
characteristic  low-level  parasitemia  of  P.  ovale  infection  further  complicates  microscopic 
detection  due  to  the  difficulty  in  finding  and  identifying  low  numbers  of  P.  ovale 
parasites  (65).  Finally,  malaria  Rapid  Diagnostic  Tests  (RDTs)  show  a  reduced  ability  to 
detect  P.  ovale  compared  to  other  human  malaria  parasites,  resulting  in  false  negative 
cases  (49;  73;  1 1 1).  However,  the  use  of  extremely  sensitive  molecular  detection 
methods,  such  as  polymerase  chain  reaction  (PCR),  have  revealed  a  higher  prevalence  of 
P.  ovale  and  expanded  the  geographical  distribution  of  this  malaria  parasite  compared  to 
what  was  previously  identified  based  on  microscopy  (27;  80;  139;  198;  262). 

Recent  findings  demonstrated  that  P.  ovale  exists  as  two  genetically  distinct 
sympatric  subspecies,  P.  ovale  curtisi  and  P.  ovale  wallikeri  (102;  213;  276;  318). 
Morphological  differences  between  the  two  P.  ovale  subspecies  have  not  been  identified, 
thereby  limiting  the  use  of  microscopy  to  differentiate  P.  ovale  curtisi  and  P.  ovale 
wallikeri.  As  recent  studies  suggest  potential  clinical  and  latency  duration  differences 
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between  the  two  P.  ovale  subspecies,  (207;  245),  a  discriminatory  assay  to  differentiate 
P.  ovale  curtisi  and  P.  ovale  wallikeri  is  clinically  relevant.  Additionally,  initial  P.  ovale- 
specific  assays  developed  by  our  group  and  others  were  unknowingly  designed  based  on 
gene  sequences  specific  to  only  one  subspecies,  thereby  failing  to  detect  the  other  P. 
ovale  subspecies.  PCR  assays  that  target  conserved  genetic  regions  between  the  two 
subspecies  are,  therefore,  necessary  to  determine  the  true  P.  ovale  prevalence  and 
distribution  (38;  57;  101;  286). 

Small-subunit  ribosomal  RNA  (ssrRNA)  genes  are  common  targets  for  malaria 
parasite  species-specific  assays  based  on  nucleotide  polymorphisms  that  facilitate 
specific  detection  of  the  species  of  interest  (57;  101;  250).  Although  rRNA  based  PCR 
assays  have  proven  useful  for  the  detection  of  low-level  parasitemias  of  a  single  malaria 
parasite  species,  Demas  et  al.  demonstrated  that  alternative  gene  targets  may  be  more 
sensitive  for  species-specific  detection  in  the  context  of  mixed  species  infections  (76).  A 
quality  control  program  to  determine  the  ability  of  10  different  laboratories  to  detect 
malaria  parasite  species  based  on  rRNA  PCR  revealed  detection  of  P.  ovale  to  be  the 
most  difficult,  with  a  detection  rate  of  70%  (290).  Additionally,  allelic  diversity  within 
the  P.  ovale  ssrRNA  alleles  may  further  limit  the  ability  of  rRNA  specific  PCR  assays  to 
detect  P.  ovale  infections  (161).  Due  to  these  difficulties  in  the  detection  of  P.  ovale,  we 
designed  a  novel  P.  ovale- specific  assay  based  on  a  gene  found  only  in  P.  ovale  curtisi 
and  P.  ovale  wallikeri  and  not  present  in  other  human  malaria  parasite  species.  This 
approach  reduces  aberrant  amplification  of  non-target  malaria  species  and  allows  for  the 
detection  of  low-level  P.  ovale  infections  in  the  presence  of  high  parasitemias  of  other 
malaria  parasite  species,  such  as  P  .falciparum. 
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Several  epidemiology  surveys  of  extant  malaria  species  have  established  the 
endemicity  of  P.  ovale  in  Western  Kenya  based  on  microscopic  identification, 
entomological  studies,  and  nucleic  acid  detection  methods  (42;  65;  172;  211;  300). 
Clinical  cases  due  to  P.  ovale  relapse  in  non-immune  individuals  after  traveling  to 
Western  Kenya  have  also  been  reported,  including  a  single  case  of  a  returned  traveller 
with  P.  ovale  curtisi  infection  (207;  224).  However,  the  lack  of  data  on  the  prevalence 
and  distribution  of  P.  ovale  curtisi  and  P.  ovale  wallikeri  in  Western  Kenya  represents  a 
critical  gap  in  our  understanding  of  the  true  malaria  epidemiology  in  this  region  that 
could  impact  both  patient  treatment  and  malaria  control  strategies. 

In  this  study,  we  developed  a  novel,  highly  specific,  real-time  PCR  (qPCR)  assay 
to  detect  all  P.  ovale  subspecies  simultaneously  based  on  a  conserved  region  of  the  P. 
ovale- specific  reticulocyte  binding  protein  2  ( rbp2 )  gene.  This  inclusive  P.  ovale  rbp2 
qPCR  assay  was  characterized  and  validated  to  determine  the  sensitivity,  limit  of 
detection,  limit  of  quantification,  specificity,  repeatability,  and  reproducibility.  In 
addition,  the  occurrence  of  both  P.  ovale  subspecies  (P.  ovale  curtisi  and  P.  ovale 
wallikeri)  was  documented  in  Western  Kenya  using  multilocus  genotyping.  Our  P.  ovale 
species-specific  assay  can  be  utilized  to  better  characterize  the  presence,  parasitemia, 
geographical  distribution,  and  the  contribution  of  this  malaria  parasite  species  to  mixed 
species  infections  and  to  clinical  disease  in  malaria  endemic  regions. 

Methods 
Sample  Collection 

Anonymized  human  whole  blood  samples  were  collected  with  signed  informed 
consent  under  approved  protocols  (Walter  Reed  Army  Institute  of  Research  Human  Use 
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and  Review  Committee  Protocols  #1720  and  1306,  Kenya  Medical  Research  Institute 
(KEMRI)  SSC#2008  and  1111).  Clinically  healthy  (asymptomatic)  adult  individuals  in 
Nyanza  Province,  Kenya  were  screened  (active  detection)  with  the  Parascreen  Pan/Pf  ® 
malaria  Rapid  Diagnostic  Test  (Zephyr  Biomedicals,  Verna,  Goa,  India)  for  the 
presence/absence  of  malaria  parasites  from  March  through  September  of  2008.  Thin  and 
thick  smears  were  examined  subsequently  by  up  to  5  expert  microscopists  in  the  Malaria 
Diagnostic  Centre  (MDC),  Kisumu,  Kenya  for  malaria  species  designation  and  estimation 
of  quantitative  parasitemia  (214).  Samples  identified  as  positive  for  P.  ovale  (n=22)  via 
microscopy,  in  which  all  were  mixed  infections  with  other  malaria  species,  were  targeted 
for  DNA  extraction  and  PCR  based  analysis.  DNA  was  extracted  from  200  microliters  of 
whole  blood  using  the  QIAamp  DNA  Minikit  (Qiagen,  Venlo,  Netherlands)  following  the 
manufacturer’s  protocol.  DNA  was  eluted  in  200  microliters  of  Buffer  EB  and  samples 
were  stored  at  -20°C  until  time  of  use.  A  human-specific  RNaseP  based  qPCR  assay  was 
performed  for  each  sample  in  duplicate  to  confirm  successful  nucleic  acid  extraction 
(136). 

Characterization  of  P.  ovale  subspecies  in  Western  Kenya 
Tryptophan-rich  antigen  (tra)  gene 

The  P.  ovale- specific  tryptophan-rich  antigen  (tra)  gene  was  recently  identified  as 
a  target  to  discriminate  between  P.  ovale  subspecies  based  on  DNA  sequence  length  and 
single  nucleotide  polymorphisms  (SNPs)  (213;  276;  286).  We  utilized  the  PoTRA  fwd3 
and  PoTRA  rev3  primers  reported  in  Oguike  et  al.  201 1  for  PCR  analysis  (213).  Primers 
(Table  9)  were  synthesized  by  Integrated  DNA  Technologies  (IDT,  Coralville,  IA,  USA) 
and  purified  by  standard  desalting  methods.  Each  PCR  assay  consisted  of  IX  Sigma 
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JumpStart  REDTaq  ReadyMix  (20  mM  Tris-HCl,  100  mM  KC1, 4  mM  MgCl2, 0.4  mM 
of  each  dNTP,  0.03  unit/ pi  of  Taq  DNA  polymerase,  Sigma,  Balcatta,  WA,  USA),  8.75 
picomoles  of  each  primer,  and  one  microliter  of  template  with  a  final  volume  of  25 
microliters.  PCR  cycling  conditions  were:  initial  denaturation  for  2  minutes  at  95 °C 
followed  by  45  cycles  of  95°C  for  30  seconds,  58°C  for  45  seconds,  72°C  for  1  min  and  a 
final  extension  at  75°C  for  5  minutes.  All  conventional  PCRs  were  performed  on  a  DNA 
Engine  PTC-200  Thermal  Cycler  (MJ  Research,  Waltham,  MA,  USA). 

Reticulocyte  binding  protein  2  (rbp2)  gene 

The  reticulocyte  binding  protein  2  ( rbp2 )  gene  was  utilized  by  Oguike  et  al.  201 1 
to  differentiate  between  P.  ovale  subspecies  using  qPCR  melt  curve  profiles  based  on  six 
SNPs  present  within  a  120  base  pair  fragment.  We  designed  a  novel  set  of  primers  (Table 
9,  IDT)  using  Primer  Express  software  (Life  Technologies,  version  3.0;  Frederick,  MD, 
USA)  to  amplify  a  smaller,  74  base  pair  region  of  the  rbp2  gene  for  assay  development. 
Our  primers  (PoRBP2f  and  PoRBP2r)  are  located  within  conserved  DNA  sequences  of 
the  P.  ovale  subspecies  to  ensure  detection  and  amplification  of  both  P.  ovale  subspecies. 
The  amplicon  also  contains  a  single  SNP  to  distinguish  P.  ovale  subspecies  by  DNA 
sequencing.  Figure  7  shows  the  single  SNP  in  the  rbp2  amplicon  at  position  431,  in 
which  P.  ovale  curtisi  contains  an  adenine  and  P.  ovale  wallikeri  contains  a  thymine. 
Primer  BLAST  was  utilized  to  ensure  our  primers  were  specific  for  P.  ovale  and  would 
not  amplify  non-/3,  ovale  malaria  parasite  DNA  or  human  DNA.  PCRs  consisted  of  IX 
Sigma  JumpStart  REDTaq  ReadyMix  Reaction  Mix,  25  picomoles  of  each  primer,  and 
one  microliter  of  template,  with  a  final  volume  of  25  microliters.  PCR  cycling  conditions 
were  as  follows:  initial  denaturation  at  95°C  for  2  minutes  followed  by  40  cycles  of  95°C 
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for  30  seconds,  55°C  for  30  seconds,  72°C  for  30  seconds,  and  a  final  extension  at  72°C 


for  10  minutes. 

Small  subunit  ribosomal  RNA  (ssrRNA)  gene 

We  utilized  P.  ovale- specific  primers  (Table  9,  IDT)  reported  by  Fuehrer  et  al. 
2012  (rOVAlWC  and  rOVA2WC)  to  further  characterize  P.  ovale  positive  samples 
based  on  differences  within  the  small  subunit  ribosomal  RNA  (ssrRNA)  gene  (101). 

PCRs  consisted  of  IX  Sigma  JumpStart  REDTaq  ReadyMix  Reaction  Mix,  25  picomoles 
of  each  primer,  one  microliter  of  template,  and  a  final  volume  of  25  microliters.  PCR 
cycling  conditions  were  as  follows:  initial  denaturation  at  95°C  for  4  minutes  followed  by 
35  cycles  of  94°C  for  1  minute,  58°C  for  2  minutes,  72°C  for  2  minutes,  and  a  final 
extension  at  72°C  for  5  minutes. 

DNA  sequencing 

PCR  products  were  visualized  on  0.7%  agarose  gels  stained  with  ethidium 
bromide.  PCR  products  were  cloned  into  the  pCR  2.1-TOPO  TA  vector  (Life 
Technologies)  based  on  manufacturer’s  guidelines.  Plasmid  purification  was  performed 
using  the  QIAprep  Spin  Miniprep  kit  (Qiagen)  and  used  as  template  for  sequencing 
reactions.  PCR  products  were  sequenced  using  the  Ml 3  Forward  (-20)  Primer  (Life 
Technologies)  at  the  Biomedical  Instrumentation  Center  at  the  Uniformed  Services 
University  or  GENEWIZ  Inc  (Germantown,  MD,  USA)  using  the  ABI  3500XL  Genetic 
Analyzer  and  the  ABI  3730XL  DNA  Analyzer,  respectively.  Sequencing  facility  was 
chosen  based  on  temporal  availability.  DNA  sequences  were  aligned  and  analyzed  with 
previously  published  sequences  using  SeqMan  software  (DNAStar  Lasergene  Version 
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8.1 .5,  Madison,  WI,  USA).  Reference  sequences  utilized  for  DNA  alignments  are  shown 
in  Table  10. 

Real-time  PCR  assay  to  detect  P.  ovale 

Primer  Express  software  (Life  Technologies,  version  3.0)  was  utilized  to  design  a 
hydrolysis  probe  (Table  9)  for  use  with  our  rbp2  primers  on  the  ABI  7500  fast  real-time 
PCR  (qPCR)  platform  (Life  Technologies).  An  alignment  of  the  P.  ovale  rbp2  DNA 
sequences  was  constructed  using  the  Clustal  Omega  Program  provided  by  the  European 
Molecular  Biology  Laboratory  -  European  Bioinformatics  Institute  (EMBL-EBI)  (1 10; 
259).  We  utilized  the  Jalview  output  tool  to  visualize  the  DNA  sequence  alignment 
(Figure  7)(314).  Primers  and  probe  were  designed  in  order  to  amplify  a  conserved  region 
within  the  rbp2  gene  to  ensure  detection  of  both  P.  ovale  subspecies  by  our  qPCR  assay 
at  the  same  time.  In  silico  analyses  were  performed  to  ensure  primers  and  probe  were 
specific  to  P.  ovale  and  would  not  amplify  genes  of  other  malaria  parasites  or  human 
DNA.  Each  qPCR  reaction  consisted  of  the  following:  IX  TaqMan  Fast  Universal  PCR 
Master  Mix,  No  AmpErase  UNG  (Life  Technologies,  Cat  No.  4364103),  5  picomoles  of 
each  primer  and  probe,  and  one  microliter  of  template  in  a  final  volume  of  20  microliters. 
Real-time  PCR  was  performed  utilizing  fast  thermal  cycling  conditions  (95°C  for  20 
seconds,  followed  by  40-60  cycles  of  95°C  for  3  seconds  and  60°C  for  30  seconds). 
Analysis  of  qPCR  results  was  performed  using  ABI  7500  Fast  Real-Time  PCR  Systems 
Software  (Life  Technologies,  Version  2.0.5).  Basic  statistical  analyses  (means,  standard 
deviations,  coefficient  of  variation),  generation  of  standard  curve  graphs,  calculation  of 
slopes,  and  coefficient  of  correlation  were  performed  in  Microsoft  Excel  or  GraphPad 
Prism  (GraphPad  Prism  Software  Version  6,  La  Jolla,  CA,  USA). 
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Plasmid  standard  curve 


We  cloned  the  74  base  pair  rbp2  amplicon  into  the  pCR  2.1-TOPO  TA  vector 
(Life  Technologies)  following  manufacturer’s  guidelines  and  eluted  the  rbp2  plasmid  in 
PCR  grade  water.  The  approximate  rbp2  amplicon  copy  number  per  microliter  was 
determined  based  on  spectrophotometer  (Nanodrop  2000c)  concentration  in  nanograms 
per  microliter.  Plasmids  with  the  rbp2  amplicon  ( rbp2  plasmid)  were  diluted  in  water  to 
generate  a  ten-fold  serial  dilution  from  100,000  rbp2  copies  per  microliter  to  0.1  rbp2 
copies  per  microliter.  The  resulting  non-linearized  ten-fold  serial  dilution  series  was 
utilized  as  a  standard  curve  in  subsequent  validation  experiments  including  determination 
of  the  linear  dynamic  range,  specificity,  reproducibility,  repeatability,  and  limit  of 
detection.  The  effect  of  the  conformation  of  the  rbp2  plasmid  on  standard  curve  linearity 
was  analyzed  by  linearizing  the  rbp2  plasmid  using  the  Notl  restriction  enzyme  (New 
England  BioLabs  Inc,  Ipswich,  MA,  USA)  according  to  the  manufacturer’s  protocol. 
Rbp2  plasmid  linearization  was  confirmed  by  gel  electrophoresis  on  a  0.7%  agarose  gel 
stained  with  ethidium  bromide.  Linearized  rbp2  plasmid  was  purified  using  the  Qiagen 
PCR  Purification  Kit  following  the  manufacturer’s  protocol.  The  approximate  rbp2  copy 
number  per  microliter  of  the  linearized  rbp2  plasmid  was  determined  and  diluted  in  water 
to  generate  a  ten-fold  serial  dilution  (100,000  to  0.1  copies  per  microliter).  The  rbp2 
standard  curve  PCR  efficiency  and  coefficient  of  correlation  (R2)  were  determined  and 
the  Pearson  product-moment  correlation  was  used  to  compare  the  linearized  and  non- 
linearized  rbp2  plasmid  standard  curves  (GraphPad  Prism). 
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Validation  experiments 

Real-time  PCR  efficiency  was  determined  using  a  standard  curve  of  10-fold  serial 
dilutions  of  the  non-linearized  rbp2  plasmid.  Efficiency  (E)  was  calculated  using  the 
following  formula:  E  =  io(-1Astope)  - 1 .  Rbp2  plasmid  standard  curve  samples  were  run  at 
least  in  duplicate  and  the  mean  quantification  cycle  (Cq)  value  was  utilized  to  generate 
the  standard  curve.  The  limit  of  detection  was  defined  as  the  concentration  of  rbp2 
plasmid  in  copies  per  microliter  that  gave  a  positive  signal  in  at  least  one  replicate  well  in 
two  separate  qPCR  experiments.  Limit  of  quantification  was  defined  as  the  range  of  rbp2 
plasmid  concentrations  that  maintained  linearity  and  therefore  could  be  used  to  quantify 
P.  ovale  concentration  from  test  samples. 

Specificity  was  analyzed  using  DNA  template  from  non-/3,  ovale  malaria  parasite 
species  and  uninfected  human  DNA.  Genomic  DNAs  from  P.  falciparum  strains  3D7 
(WRAIR),  FCR3CSA  (ATCC/BEI  Resources,  MR4,  Manassas,  Virginia),  Dd2 
(ATCC/BEI  Resources,  MR4),  and  NF54  (ATCC/BEI  Resources,  MR4)  were  utilized  as 
template  to  assess  specificity.  P.  vivax  genomic  DNA  was  extracted  from  frozen  whole 
parasites  (kind  gift  of  Dr.  J.  Prachumsri,  Mahidol  University,  Bangkok,  Thailand).  Since 
pure  P.  malariae  positive  samples  were  unavailable,  we  utilized  three  samples  collected 
as  part  of  the  blood  collection  protocol  in  Kenya  that  were  positive  for  P.  malariae  as 
well  as  P.  falciparum  by  microscopy  and  PCR,  but  were  negative  for  P.  ovale.  The  P. 
malariae  parasitemias  ranged  from  approximately  30  to  2400  parasites  per  microliter. 
Additionally,  genomic  DNAs  from  P.  knowlesi,  P.  simiovale,  P.  fragile,  and  P. 
cynomolgi  (ATCC/BEI  Resources,  MR4),  were  also  utilized  as  templates.  Specificity  was 
further  analyzed  by  performing  spiking  experiments  in  which  a  known  concentration  of 
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rbp2  plasmid  was  added  to  template  containing  P.  falciparum  3D7  DNA  (10,000 
parasites  per  microliter)  or  P.  vivax  DNA  (517  parasites  per  microliter).  One-way 
analysis  of  variance  (ANOVA)  was  used  to  determine  differences  in  Cq  values  for 
spiking  experiments  (GraphPad  Prism). 

Within-run  repeatability  was  defined  as  the  variation  of  Cq  values  within  a  single 
run  and  was  analyzed  by  calculating  the  percent  coefficient  of  variation  (%CV)  of  Cq 
values  in  replicate  wells.  Between-run  repeatability  was  defined  as  the  variation  of  Cq 
values  in  separate  qPCR  runs  and  was  determined  by  calculating  the  percent  coefficient 
of  variation  (%CV)  of  mean  Cq  values  based  on  six  separate  qPCR  experiments. 
Reproducibility  was  evaluated  by  comparing  the  assay  performance  by  a  technician  at  the 
USAMRU-K  laboratory  in  Kisumu,  Kenya  and  the  Uniformed  Services  University  in 
Bethesda,  Maryland,  USA. 

Quantification  comparison:  microscopy  versus  rbp2  qPCR 

Parasitemias  were  determined  for  P.  ovale  positive  blood  films  using  standard 
microscopic  methods  at  the  Malaria  Diagnostic  Centre,  affiliated  with  both  USAMRU-K 
and  KEMRI,  in  Kisumu,  Kenya.  DNA  was  extracted  from  microscopy-positive  P.  ovale 
samples  and  tested  using  the  P.  ovale- specific  rbp2  qPCR  assay.  Approximate  rbp2  copy 
number  per  microliter  was  determined  based  on  the  rbp2  plasmid  standard  curve. 
Parasitemias  as  determined  by  expert  microscopy  (parasites  per  microliter)  were 
compared  to  rbp2  copy  number  per  microliter  as  determined  by  the  P.  ovale- specific 
qPCR  in  order  to  examine  potential  correlation  between  rbp2  plasmid  copy  number  and 
microscopic  parasitemias. 
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Results 


P.  ovale  subspecies  characterization 
Human-specific  RNaseP  qPCR 

A  previously  described  qPCR  assay  based  on  the  human-specific  RNaseP  gene 
was  performed  to  confirm  the  presence  of  nucleic  after  DNA  extraction  (136).  The 
human  RNaseP  gene  was  detected  from  all  22  samples  (Average  Cq=29.12,  Cq 
Range=28 .2-32.87,  standard  deviation=1.02),  indicating  extraction  methods  yielded  DNA 
suitable  for  subsequent  PCR  experiments. 

Tryptophan-rich  antigen  (tra)  gene 

Alignments  of  tra  gene  sequences  revealed  nine  samples  (40.9%)  positive  for  P. 
ovale  curtisi  type  1 ,  two  samples  (9.1%)  positive  for  P.  ovale  curtisi  type  2,  six  samples 
(27.3%)  positive  for  P.  ovale  wallikeri  type  1 ,  and  three  samples  (13.6%)  positive  for  P. 
ovale  wallikeri  type  2  (Table  1 1).  Previously  published  GenBank  accession  numbers 
were  utilized  as  reference  sequences  for  alignment  and  are  shown  in  Table  10. 
Representative  P.  ovale  curtisi  type  1 ,  P.  ovale  curtisi  type  2,  P.  ovale  wallikeri  type  1 , 
and  P.  ovale  wallikeri  type  2  tra  DNA  sequences  were  deposited  under  GenBank 
accession  numbers  KM494978-KM494981 ,  respectively,  and  are  identical  to  the 
reference  sequences.  As  shown  in  Table  12,  unique  polymorphisms  within  the  tra  gene 
were  also  detected  and  confirmed  by  at  least  two  separate  sequencing  reactions  for  5 
samples:  Po05,  Pol 2,  Po20,  Po06,  and  Po07  (Accession  numbers  KM494982- 
KM494986,  respectively).  Samples  Pol 2  and  Po20  contained  an  18  base  pair  insertion 
between  nucleotide  positions  171  and  172  (based  on  P.  ovale  wallikeri  type  1  HM594180 
reference  sequence),  which  represents  a  short  sequence  repeated  throughout  the  tra  gene. 
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Two  samples,  Po9  and  Pol 8,  failed  to  amplify  with  the  tra  primers  despite  multiple  PCR 
attempts. 

Reticulocyte  binding  protein  2  (rbp2)  gene 

DNA  sequences  of  the  rbp2  gene  were  obtained  for  all  22  P.  ovale  samples  (Table 
1 1).  Table  14  contains  the  74  pair  rbp2  amplicon  for  both  P.  ovale  curtisi  and  P.  ovale 
wallikeri.  These  sequences  were  not  eligible  for  submission  as  the  minimum  length 
requirement  for  GenBank  is  200  nucleotides.  P.  ovale  subspecies  results  based  on  rbp2 
gene  sequences  agreed  with  subspecies  results  based  on  the  tra  gene  sequences.  Thirteen 
(59%)  of  the  P.  ovale  samples  were  positive  for  P.  ovale  curtisi  and  9  (41%)  were 
positive  for  P.  ovale  wallikeri.  None  of  our  samples  failed  to  amplify  with  the  rbp2 
primers. 

Small  subunit  rRNA  (ssrRNA)  gene 

Nineteen  of  the  22  P.  ovale  positive  samples  were  detected  by  the  ssrRNA  gene 
assay  (Table  1 1).  P.  ovale  curtisi  and  P.  ovale  wallikeri  ssrRNA  sequences  were 
approximately  99%  identical  to  previously  published  sequences  at  this  locus. 
Representative  P.  ovale  curtisi  and  P.  ovale  wallikeri  ssrRNA  sequences  were  deposited 
in  GenBank  as  KM494987  and  KM494988,  respectively.  P.  ovale  subspecies  results 
based  on  ssrRNA  gene  sequences  agreed  with  subspecies  results  based  on  tra  and  rbp2 
gene  sequences.  Three  samples,  Po9,  Poll,  and  Pol 8,  failed  to  amplify  using  the  ssrRNA 
primers  despite  a  second  attempt  using  an  additional  microliter  of  template  DNA. 
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Real-time  PCR  to  detect  P.  ovale 


Plasmid  standard  curve  analysis  of  rbp2  qPCR  assay 

Since  all  22  P.  ovale  microscopy  positive  samples  were  successfully  amplified 
and  sequenced  using  the  rbp2  primers,  we  developed  an  rbp2  based  qPCR  assay  to  detect 
all  P.  ovale  subspecies  simultaneously  in  a  single  assay.  Efficiency  of  the  rbp2  qPCR 
assay  was  analyzed  using  the  non-linearized  rbp2  plasmid  10-fold  serial  dilution  standard 
curve.  Efficiency  ranged  from  90%-99%  for  six  consecutive  qPCR  experiments  with  a 
coefficient  of  correlation  (R2)  greater  than  0.99.  A  representative  qPCR  amplification  plot 
and  standard  curve  are  shown  in  Figures  8  and  9,  respectively.  All  22  P.  ovale  samples 
identified  as  P.  ovale  positive  by  expert  microscopy  were  detected  using  our  rbp2  qPCR 
assay.  There  was  no  difference  in  PCR  efficiency  or  R2  value  based  on  the  conformation 
(linearized  vs.  non-linearized)  of  the  rbp2  plasmid  standard  curve  (Pearson  product- 
moment  correlation^  .998,  P<0.001). 

Limit  of  quantification  and  limit  of  detection 

The  linear  dynamic  range  of  the  rbp2  qPCR  assay  was  determined  to  be  between 
6.25  copies  per  microliter  and  100,000  copies  per  microliter  based  on  serial  dilutions  of 
the  rbp2  plasmid.  Two-fold  serial  dilutions  of  known  concentrations  of  the  rbp2  plasmid 
were  performed  in  at  least  duplicate  to  determine  the  limit  of  detection.  Dilutions 
containing  1 .5  copies  per  microliter  of  the  rbp2  plasmid  were  detected  by  at  least  one 
replicate  well  in  two  separate  qPCR  experiments. 

Specificity 

In  order  to  test  the  specificity  of  our  rbp2  assay  for  P.  ovale,  we  performed  qPCR 
using  DNA  isolated  from  cultured  P.  falciparum  3D7  (10,000  parasites  per  microliter) 
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and  P.  vivax  DNA  (517  parasites  per  microliter).  Based  on  a  series  of  ten  separate  qPCR 
experiments,  DNA  from  P.  falciparum  and  P.  vivax  were  uniformly  negative.  To  ensure 
no  background  from  other  P.  falciparum  strains,  we  tested  genomic  DNAs  from  strains 
Dd2,  NF54,  and  FCR3CSA,  which  were  also  not  detected  by  our  assay.  We  tested  DNA 
from  P.  knowlesi,  P.  fragile,  and  P.  cynomolgi  and  found  DNA  from  these  malaria 
parasite  species  were  undetectable  by  our  rbp2  qPCR  assay.  As  we  were  unable  to  obtain 
pure  P.  malariae  samples,  we  examined  DNA  samples  isolated  from  the  blood  of 
individuals  co-infected  with  both  P.  malariae  and  P.  falciparum.  These  P.  falciparum  and 
P.  malariae  co-infected  samples  were  also  negative,  indicating  that  our  rbp2  qPCR  assay 
does  not  detect  P.  malariae  DNA.  Two  different  control  DNA  samples  from  malaria 
uninfected  human  blood  were  also  uniformly  negative.  All  specificity  experiments  were 
carried  out  to  60  cycles  in  an  attempt  to  capture  non-specific  amplification,  which  was 
never  seen,  although  the  standard  curve  and  the  P.  ova/e-containing  field  samples 
amplified  appropriately. 

Spiking  experiments,  in  which  P.  falciparum  DNA  or  P.  vivax  DNA  was  added  to 
the  rbp2  plasmid  standard  curve  samples  and  subsequently  utilized  as  template  for  the 
rbp2  qPCR  did  not  significantly  alter  the  Cq  values  compared  to  when  the  standard  curve 
plasmid  samples  were  run  alone  (ANOVA,  P  =  0.9993,  Figure  10). 

Interestingly,  our  rbp2  qPCR  assay  detected  P.  simiovale  genomic  DNA  isolated 
from  filter  paper.  DNA  sequencing  utilizing  the  rbp2  primers  revealed  that  the  74  base 
pair  rbp2  region  in  P.  simiovale  is  identical  to  that  in  P.  ovale  curtisi.  Subsequent 
attempts  using  additional  primers  to  sequence  the  full-length  rbp2  gene  of  P.  simiovale 
were  not  successful.  As  these  additional  primers  successfully  amplified  P.  ovale  positive 
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samples,  the  inability  to  amplify  the  full-length  P.  simiovale  rbp2  gene  is  likely  due  to 
sequence  polymorphisms  between  P.  ovale  and  P.  simiovale  in  the  primer  binding 
regions. 

Repeatability 

Within-run  repeatability  of  the  rbp2  plasmid  standard  curve  Cq  values  was  high, 
with  the  percent  coefficient  of  variation  (%CV)  of  dilution  series  replicates  between  0.00- 
2.23%  (Table  13).  Results  were  also  repeatable  between  runs,  with  the  percent 
coefficient  of  variation  (%CV)  between  1.17-3.43%  (Table  13).  Repeatability  was 
determined  using  results  from  six  separate  consecutive  qPCR  experiments. 

Reproducibility 

Analysis  of  the  efficiency  of  the  rbp2  assay  was  performed  independently  at  the 
USAMRU-K  laboratory.  A  known  concentration  of  non-linearized  rbp2  plasmid  was 
diluted  in  PCR  grade  water  to  generate  a  10-fold  dilution  standard  curve  for  PCR 
efficiency  analysis.  The  assay  was  performed  with  the  same  P.  ovale- specific  primers  and 
probe  utilized  in  validation  experiments  in  a  final  volume  of  50  microliters  of  Life 
Technologies  TaqMan  Fast  Master  Mix  for  the  USAMRU-K  ABI  7500.  Despite  slight 
variations  in  qPCR  set  up  and  cycling  conditions,  the  Kenya  laboratory  obtained  a  PCR 
efficiency  of  93.6%  with  an  R2  >0.99  for  the  standard  curve  analysis.  These  results  are 
identical  to  the  PCR  efficiencies  and  R2  values  obtained  at  USU.  The  USAMRU-K 
laboratory  also  performed  specificity  experiments  and  demonstrated  no  amplification 
from  P.  falciparum  DNA,  DNA  from  uninfected  human  blood,  or  from  negative  template 
controls. 
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Quantification  comparison:  microscopy  versus  rbp2  qPCR 

Quantitative  parasitemia  determined  by  expert  microscopy  (parasites  per 
microliter)  was  compared  to  the  rbp2  copy  number  per  microliter  based  on  the  rbp2 
plasmid  standard  curve  (Figure  1 1).  A  modest  correlation  was  determined  (R2=0.6595). 
This  lack  of  a  strong  correlation  is  not  surprising,  as  all  P.  ovale  parasitemias  were  low, 
ranging  from  16-3800  parasites/ |ol,  and  such  low-level  parasitemias  are  notoriously 
difficult  to  quantify  accurately  by  microscopy  (84;  209;  210;  214).  Additionally,  the 
samples  utilized  for  comparison  were  mixed  malaria  species  infections,  mainly  with  P. 
falciparum.  Mixed  species  infections  create  further  difficulties  for  the  accurate 
quantification  of  P.  ovale- specific  parasitemia  based  on  light  microscopy,  but  single¬ 
species  P.  ovale  infected  samples  were  not  available. 

Discussion 

Based  on  multilocus  genotyping  using  the  rbp2,  ssrRNA,  and  tra  genes,  we 
detected  both  P.  ovale  curtisi  and  P.  ovale  wallikeri  in  approximately  equal  frequencies 
in  a  small  sample  set  from  Western  Kenya,  a  region  in  which  P.  ovale  subspecies 
characterization  had  not  been  previously  performed.  The  presence  of  both  P.  ovale 
subspecies  in  Western  Kenya  is  in  agreement  with  other  studies  in  sub-Saharan  Africa 
and  P.  ovale  endemic  regions  that  describe  the  sympatric  distribution  of  P.  ovale  curtisi 
and  P.  ovale  wallikeri  (38;  101;  213).  We  also  identified  additional  allelic  diversity 
within  the  tra  gene  in  P.  ovale  samples  from  Kenya  (Table  12)  compared  to  what  was 
previously  identified  in  P.  ovale  samples  from  other  malaria  endemic  regions  (213).  This 
allelic  diversity  at  the  P.  ovale  tra  gene  is  consistent  with  reports  of  other  tra  variants 
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identified  by  DNA  sequencing,  however  our  tra  sequences  are  unique  from  previously 
published  tra  gene  sequences  (286). 

Our  new  inclusive  P.  ovale- specific  qPCR  assay  is  based  on  rbp2,  a  gene  that 
contains  conserved  regions  between  P.  ovale  curtisi  and  P.  ovale  wallikeri  but  that  is 
absent  from  other  human  malaria  parasite  species.  The  rbp2  qPCR  assay  described  herein 
allows  simultaneous  detection  of  both  P.  ovale  subspecies  using  a  single  set  of  primers 
and  probe.  All  22  samples  were  detected  and  sequenced  using  our  rbp2  primers, 
highlighting  the  utility  of  these  primers  for  P.  ovale  identification.  P.  ovale  subspecies 
differentiation  by  DNA  sequencing  of  the  74  base  pair  rbp2  sequence  region  was  in 
absolute  agreement  with  tra  and  ssrRNA  DNA  sequencing  results.  This  again  emphasizes 
the  utility  of  the  PoRBP2fwdl  and  PoRBP2revl  primers  for  P.  ovale  subspecies 
discrimination  based  on  a  single  SNP  at  position  43 1  (Figure  7)  located  between  these 
primers.  In  agreement  with  other  previous  studies,  these  data  demonstrate  perfect 
dimorphism  between  P.  ovale  curtisi  and  P.  ovale  wallikeri,  providing  further  support  for 
the  separation  of  the  two  P.  ovale  subspecies  (101;  102;  140;  213;  276;  279;  318).  As  we 
begin  to  understand  potential  clinical,  pathological,  and  biological  differences  between 
the  two  P.  ovale  subspecies,  molecular  methods  to  distinguish  P.  ovale  curtisi  and  P. 
ovale  wallikeri  will  aid  in  these  research  efforts.  Additionally,  as  genomic  data  and  full 
genome  sequences  become  available  for  P.  ovale  curtisi  and  P.  ovale  wallikeri , 
phylogenetic  analyses  to  determine  the  evolutionary  relatedness  between  these  and  other 
malaria  species  will  likely  further  our  understanding  of  these  newly  characterized  but 
poorly  understood  human  parasites. 
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Using  the  rbp2  plasmid  as  a  standard  curve,  the  linear  dynamic  range  of  our  assay 
was  determined  to  be  between  6.25  copies  of  rbp2  per  microliter  to  100,000  copies  of 
rbp2  per  microliter.  The  lower,  non-linear  but  still  clearly  positive  limit  of  detection  of 
our  assay  was  determined  to  be  1 .5  copies  of  rbp2  per  microliter,  confirming  this  assay’s 
capacity  to  detect  low-level  parasitemias.  P.  ovale  parasitemias  are  characteristically 
lower  than  other  malaria  species,  so  we  limited  the  testing  of  our  upper  dynamic  range  to 
100,000  rbp2  copies  per  microliter,  as  higher  copy  numbers  would  likely  be 
epidemiologically  and  clinically  irrelevant.  We  used  the  rbp2  plasmid  to  determine  the 
linear  dynamic  range  and  limit  of  detection  because  of  difficulties  obtaining  pure  P.  ovale 
infected  samples  from  malaria  endemic  regions  and  the  inability  to  culture  P.  ovale 
parasites.  The  paucity  of  published  genomic  information  for  P.  ovale  also  hinders  the 
determination  of  copy  number  of  P.  ova /<? -specific  genes,  such  as  the  rbp2,  tra,  and 
ssrRNA  genes,  utilized  in  this  study.  Thus,  we  are  further  limited  in  our  attempts  to 
appropriately  correlate  rbp2  copy  number  and  P.  ovale  parasitemias.  Despite  these 
limitations,  we  demonstrate  the  utility  of  our  P.  ovale- specific  assay  to  detect  low-levels 
of  the  rbp2  plasmid  and  to  detect  low  P.  ovale  parasitemias  (as  low  as  16  parasites  per 
microliter)  from  human  blood  samples  collected  in  Western  Kenya.  Our  study  was  also 
limited  by  only  testing  samples  collected  in  Western  Kenya  and  additional  validation  is 
therefore  needed  to  confirm  the  ability  of  the  rbp2  qPCR  assay  to  detect  total  P.  ovale 
from  other  malaria  endemic  regions.  As  the  22  samples  included  in  this  study  were 
identified  as  P.  ovale  by  microscopy,  further  studies  are  needed  to  test  the  P.  ovale  rbp2 
qPCR  assay  with  submicroscopic  and  asymptomatic  P.  ovale  infections  with  a  range  of 
parasitemias. 
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Repeatability  and  reproducibility  of  qPCR  assays  are  important  components  of 
assay  validation  as  they  indicate  the  assay’s  capacity  to  provide  consistent  and  reliable 
results  in  different  environments.  Different  users  under  modified  laboratory  conditions 
performed  this  assay  successfully,  with  high  PCR  efficiency  and  equivalent 
quantification. 

Specificity  experiments  showed  no  cross  reactivity  of  our  assay  with  P. 
falciparum,  P.  vivax,  P.  malariae,  P.  cynomolgi,  P.  knowlesi,  P.  fragile,  and  DNA  from 
uninfected  human  blood  even  when  qPCR  was  performed  for  60  cycles.  The  complete 
lack  of  background  amplification  from  human  and  other  malaria  parasite  DNA,  verifies 
the  exquisite  specificity  of  the  assay.  Further,  assay  performance  was  unchanged  in  the 
presence  of  DNA  from  other  malaria  parasite  species.  This  is  of  particular  importance  for 
P.  ovale,  as  this  malaria  species  is  often  found  as  a  co-infection  with  other  malaria 
species.  Interestingly,  our  rbp2  qPCR  assay  also  detected  DNA  obtained  from  P. 
simiovale.  As  P.  simiovale  rbp2  sequence  information  is  not  available,  we  attempted  to 
amplify  the  full-length  P.  simiovale  rbp2  gene  using  additional  primers  based  on  the  P. 
ovale  rbp2  gene.  However,  we  were  unable  to  amplify  the  full  P.  simiovale  rbp2  gene, 
suggesting  the  P.  ovale  and  P.  simiovale  rbp2  genes  may  be  similar  but  not  identical. 
These  results  warrant  further  investigation  of  the  P.  simiovale  rbp2  and  additional 
specificity  experiments  of  other  P, ovale  assays  that  may  also  unknowingly  detect  P. 
simiovale. 

Of  the  22  samples  identified  as  P.  ovale  positive  by  expert  microscopy,  three 
samples  (Po9,  Pol  1 ,  Pol 8)  failed  to  amplify  at  two  of  the  three  loci  tested  despite 
multiple  attempts  (Table  1 1).  However,  the  rbp2  gene  was  successfully  amplified  for  all 
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22  samples  as  was  a  human- specific  RNaseP  endogenous  control.  These  data,  along  with 
the  parasitemia  data  from  multiple  expert  microscopists,  indicate  that  the  22  samples 
were  P.  ovale  positive  and  that  DNA  template  quality  was  unlikely  to  be  the  cause  of  the 
failed  amplifications  at  the  tra  and  ssrRNA  loci.  The  inability  to  successfully  amplify  at 
all  three  loci  could  be  explained  by  several  reasons  including:  sequence  polymorphisms, 
template  degradation,  low  P.  ovale  density,  and  inter-laboratory  variability  due  to 
reagents,  equipment,  and  personnel.  Additional  investigation  into  potential  reasons  for 
the  failure  to  amplify  at  all  loci  was  prevented  due  to  limited  sample  volume. 

The  limited  correlation  between  microscopy  and  rbp2  qPCR  results  (Figure  1 1)  is 
not  surprising  as  parasitemia  calculations  for  P.  ovale  human  samples  at  low  parasitemias 
are  notoriously  difficult,  particularly  in  co-infected  samples  (84).  Our  P.  ovale  positive 
samples  from  Western  Kenya  are  all  co-infected  with  either  P.  falciparum  or  P.  malariae, 
thus  likely  complicating  the  microscopy  quantitation  further.  Variation  between 
parasitemia  and  rbp2  copy  number  could  also  be  explained  by  the  P.  ovale  parasite  stage. 
For  example,  a  P.  ovale  ring  stage  counts  as  a  single  parasite  by  microscopy  and  DNA 
extracted  from  a  P.  ovale  ring  stage  parasite  represents  one  genome.  However,  a  P.  ovale 
schizont  is  counted  as  a  single  parasite  by  microscopy  but  DNA  extracted  from  a  P.  ovale 
schizont  may  contain  up  to  14  genomes.  This  is  a  limitation  of  our  study,  as  any 
relationship  between  P.  ovale  parasitemia  and  rbp2  copy  number  based  on  qPCR  would 
depend  on  the  parasite  stages  observed  under  the  microscope  and  present  in  the  blood 
sample  obtained  for  DNA  extraction. 

Utilizing  a  plasmid  standard  curve  for  qPCR  assays  provides  an  efficient  method 
for  standardizing  assays  that  does  not  require  culturing  organisms  or  using  human 
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samples.  However,  recent  studies  have  highlighted  important  concerns  regarding  the 
plasmid  template  conformation  that  could  lead  to  quantification  bias  of  plasmid  template 
by  qPCR  (127;  168).  After  linearizing  our  template  plasmid  to  compare  with  a  non- 
linearized  plasmid  standard  curve,  we  found  no  difference  in  Cq  value,  R2,  slope,  or  PCR 
efficiency  with  the  rbp2  qPCR  assay.  This  is  in  agreement  with  another  recent  study, 
which  also  found  no  difference  in  plasmid  standard  curve  based  on  the  plasmid 
confirmation  (linearized  versus  non-linearized)  (215).  These  results  indicate  that  the 
effect  of  plasmid  conformation  on  standard  curve  quantification  may  be  assay  specific.  In 
addition  to  plasmid  conformation,  several  additional  quality  control  factors  were 
optimized,  including  plasmid  isolation  methods,  purification,  storage,  and  developing 
appropriate  laboratory  protocols  to  minimize  freeze-thawing,  handling,  and 
contamination. 

Conventional  PCR  assays  targeting  the  multi-copy  small  subunit  ssrRNA  genes 
are  sensitive  methods  to  detect  and  differentiate  malaria  species  (267).  Initial  P.  ovale  - 
specific  ssrRNA  PCR  protocols  showed  limited  capability  to  detect  both  P.  ovale 
subspecies  and  have  since  been  adapted  to  target  conserved  regions  between  the  two 
subspecies.  (58;  103;  232).  Although  ssrRNA  conventional  PCR  protocols  have  shown 
high  sensitivity  and  specificity  for  malaria  detection,  we  aimed  to  develop  a  novel  P. 
ovale- specific  assay  based  on  a  gene  target  that  is  found  only  in  P.  ovale  and  is  absent 
from  other  malaria  species  infecting  humans.  We  believe  this  approach  enhances  the 
specificity  of  our  P.  ovale- specific  assay  and  eliminates  the  potential  for  nonspecific 
amplification  of  non-/3,  ovale  species.  Additionally,  allelic  variation  within  the  ssrRNA 
genes  of  P.  ovale  curtisi  and  P.  ovale  wallikeri  may  limit  the  ability  of  ssrRNA  based 
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assays  to  capture  all  P.  ovale  infections  due  to  sequence  polymorphisms  (161).  We  found 
no  allelic  variation  in  the  primer  and  probe-binding  regions  of  the  rbp2  gene  from  22  P. 
ovale  positive  samples,  indicating  the  potential  utility  of  rbp2  for  P.  ovale  subspecies 
detection. 

While  nested  PCR  is  often  utilized  to  enhance  sensitivity  for  malaria  PCR 
detection,  we  chose  a  single  step  qPCR  protocol,  as  a  nested  PCR  approach  requires 
additional  labor  and  cost  to  perform  the  second  PCR.  Nested  PCR  also  increases  the  risk 
of  laboratory  contamination  of  PCR  product  and  requires  separate  laboratory  space  to 
minimize  the  risk  of  contamination.  Our  P.  ova/e-specific  qPCR  assay  maintains  high 
sensitivity  while  also  minimizing  the  additional  cost,  labor,  designated  laboratory  space, 
and  potential  PCR  product  contamination  that  can  be  associated  with  nested  PCR 
protocols. 

Our  P.  ovale- specific  qPCR  assay  provides  several  advantages  for  our  future 
epidemiological  studies  of  this  neglected,  and  clinically  relevant,  malaria  parasite  species. 
First,  fast  qPCR  conditions  allow  for  a  reaction  to  be  completed  in  less  than  1  hour. 
Second,  the  qPCR  platform  bypasses  the  need  for  gel  electrophoresis,  reducing  the  risk  of 
amplicon  contamination  of  the  laboratory.  Third,  the  use  of  a  hydrolysis  probe  increases 
specificity  compared  to  double  stand  DNA  (dsDNA)  based  qPCR  product  detection.  Our 
P.  ovale- specific  rbp2  qPCR  assay  can  be  utilized  to  better  characterize  the  presence, 
parasitemia,  geographical  distribution,  and  the  contribution  of  P.  ovale  to  mixed-species 
infections  and  to  clinical  disease  in  malaria  endemic  regions. 
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Table  9.  Primer  and  probe  sequences  utilized  for  conventional  PCR  and  qPCR 
experiments. 


Target  Primer/Probe  Primer/Probe  Sequence 


Reference 


Tryptophan- 
rich  antigen 

(tra) 


PoTRAfwd3 

PoTRArev3 


5  ’  -GC  ACA  AAAATGGTGCTA  ACC-3  ’ 

5  ’  -  ATCC  ATTTACCTTCC  ATTGC-3  ’ 


Oguike 
et.al  2011 
(213) 


Small 

subunit 

rRNA 

(ssrRNA) 


rOVAlWC 

rOVA2WC 


5'-TGTAGTATTCAAACGCAGT-3' 

5'-TATGTACTTGTTAAGCCTTT-3' 


Fuehrer 
et.al  2012 
(103) 


Reticulocyte 

PoRBP2fwdl 

5'-CCACAGATAAGAAGTCTCAAGTACGATATT-3' 

binding 

protein  2 

PoRBP2revl 

5  ‘  -TT  GG  AGC  AC  TTTTGTTT  GCA  A-  3 ' 

(rbp2 ) 

PoRBP2p 

5’-6FAM-TGAATTGCTAAGCGATATC-MGB-3’ 

Table  10.  GenBank  accession  numbers  used  for  DNA  alignment  of  the  P.  ovale  curtisi 
and  P.  ovale  wallikeri  tra,  rbp2,  and  ssrRNA  DNA  sequences. 


Target 

Reference  Sequence 

GenBank  Accession  Number 

P.  ovale  curtisi  type  1 

HM594182 

T  ryptophan-rich 

P.  ovale  curtisi  type  2 

HM594183 

antigen  (tra) 

P.  ovale  wallikeri  type  1 

HM594180 

P.  ovale  wallikeri  type  2 

HM594181 

Small  subunit  rRNA 

P.  ovale  curtisi 

JF894405 

(ssrRNA) 

P.  ovale  wallikeri 

JF894406 

Reticulocyte  binding 

P.  ovale  curtisi 

GU8 13971 

protein  2  ( rbp2 ) 

P.  ovale  wallikeri 

GU8 13972 
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Table  11.  P.  ovale  subspecies  identification  by  DNA  sequencing  of  the  of  the  tryptophan- 
rich  antigen  (tra)  gene,  the  reticulocyte  binding  protein  2  (rbp2)  gene,  and  the 
small  subunit  ribosomal  RNA  (ssrRNA)  gene. 


Sample 

ID 

Co-infecting 

malaria 

species 

(parasites/pl) 

P.  ovale 

microscopy 

(parasites/pl) 

Tryptophan- 
rich  antigen 

{tra) 

Reticulocyte 
binding 
protein  2 

0 rbp2 ) 

Small 

subunit 

rRNA 

(ssrRNA) 

Pol 

P .  falciparum 
(7334) 

57.6 

P.  ovale 

P.  ovale 

P.  ovale 

P.  malar iae 
(110.8) 

curtisi  type  1 

curtisi 

curtisi 

Po2 

P .  falciparum 
(653.4) 

156 

P.  ovale 

P.  ovale 

P.  ovale 

P.  malar  iae 
(114) 

curtisi  type  1 

curtisi 

curtisi 

Po3 

P .  falciparum 

458 

P.  ovale 

P.  ovale 

P.  ovale 

(67121.1) 

curtisi  type  2 

curtisi 

curtisi 

Po4 

P.  falciparum 
(571.5) 

P.  malar  iae 

42 

P.  ovale 
wallikeri  type 

P.  ovale 
wallikeri 

P.  ovale 
wallikeri 

(56) 

z 

Po5 

P.  falciparum 
(101.8) 

121.78 

P.  ovale 
wallikeri  type 

2 

P.  ovale 
wallikeri 

P.  ovale 
wallikeri 

P . falciparum 

Po6 

(306.1) 

P.  malar  iae 
(1320) 

2321.78 

P.  ovale 
curtisi  type  1 

P.  ovale 
curtisi 

P.  ovale 
curtisi 

Po7 

P .  falciparum 
(3284.2) 

69.33 

P.  ovale 

P.  ovale 

P.  ovale 

P.  malar  iae 
(1648.6) 

curtisi  type  1 

curtisi 

curtisi 

P . falciparum 

Po8 

(4568) 

P.  malar  iae 
(320) 

296.35 

P.  ovale 
curtisi  type  1 

P.  ovale 
curtisi 

P.  ovale 
curtisi 

P .  falciparum 

Po9 

(515) 

P.  malar  iae 
(255.3) 

16 

No  data 

P.  ovale 
curtisi 

No  data 

PolO 

P .  falciparum 
(1897.3) 

456.89 

P.  ovale 
curtisi  type  2 

P.  ovale 
curtisi 

P.  ovale 
curtisi 

Poll 

P . falciparum 
(412.7) 

16 

P.  ovale 
curtisi  type  1 

P.  ovale 
curtisi 

No  data 
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P.  malariae 
(583.3) 


Pol2 

P.  falciparum 
(158.9) 

P.  malariae 
(48) 

331.26 

P.  ova/e 
wallikeri  type 

1 

P.  ovale 
wallikeri 

P.  ovale 
wallikeri 

P . falciparum 

Pol3 

(613.8) 

P.  malariae 
(453.1) 

365.54 

P.  ova/e 
curtisi  type  1 

P.  ovale 
curtisi 

P.  ovale 
curtisi 

Pol4 

P . falciparum 
(16703.3) 

P.  malariae 
(32) 

157.33 

P.  ova/e 
wallikeri  type 

1 

P.  ovale 
wallikeri 

P.  ovale 
wallikeri 

Pol5 

P . falciparum 
(28976) 

3738.88 

P.  ova/e 
wallikeri  type 

1 

P.  ovale 
wallikeri 

P.  ovale 
wallikeri 

Pol6 

P . falciparum 
(3889.9) 

P.  malariae 

32 

P.  ova/e 
wallikeri  type 

P.  ovale 
wallikeri 

P.  ovale 
wallikeri 

(211.8) 

P . falciparum 

Pol7 

(16) 

P.  malariae 
(24) 

1118.08 

P.  ovale 
curtisi  type  1 

P.  ovale 
curtisi 

P.  ovale 
curtisi 

P . falciparum 

Pol8 

(382.1) 

P.  malariae 
(52.4) 

26.67 

No  data 

P.  ovale 
curtisi 

No  data 

P . falciparum 

Pol9 

(8954.2) 

P.  malariae 
(409.6) 

58.67 

P.  ovale 
curtisi  type  1 

P.  ovale 
curtisi 

P.  ovale 
curtisi 

Po20 

P .  falciparum 
(197.3) 

P.  malariae 
(304) 

350.61 

P.  ovale 
wallikeri  type 

1 

P.  ovale 
wallikeri 

P.  ovale 
wallikeri 

Po21 

P .  falciparum 
(4299.5) 

P.  malariae 
(172) 

84.36 

P.  ovale 
wallikeri  type 

1 

P.  ovale 
wallikeri 

P.  ovale 
wallikeri 

KSI 

P .  falciparum 
(no  data) 

No  data 

P.  ovale 
wallikeri  type 

1 

P.  ovale 
wallikeri 

P.  ovale 
wallikeri 
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Table  12.  Five  P.  ovale  positive  samples  contained  unique  tra  gene  polymorphisms 
identified  by  DNA  sequencing. 


Nucleotide  Position 

(Genbank  accession  71  99  171-172  523  595 

number) 


a  P.  ovale  wallikeri  type  1  (HM594180,  nucleotides  1-1171)  was  utilized  as  a  reference 
for  DNA  sequence  alignment  of  P.  ovale  wallikeri  positive  samples  with  unique 
polymorphisms  (Po05,  Pol2,  and  Po20). 

b  Dashes  (-)  indicate  lack  of  an  insertion.  Samples  Pol 2  and  Po20  contained  an  18  base 
pair  insertion  between  nucleotide  position  171  and  172  based  on  the  reference  sequence. 
c  P.  ovale  curtisi  type  1  (HM594182,  nucleotides  1-1117)  was  utilized  as  a  reference  for 
DNA  sequence  alignment  of  P.  ovale  curtisi  positive  samples  with  unique 
polymorphisms  (Po06  and  Po07). 


Table  13.  Repeatability  and  reproducibility  of  the  rbp2  plasmid  standard  curves 
determined  via  Cq  values  from  six  separate  qPCR  experiments. 


Rbp2  plasmid  Copies/pl 

100,000 

10,000 

1,000 

100 

10 

Within-run  Repeatability 

(%CV) 

0.25- 

0.74 

0.41- 

1.00 

0.048- 

1.47 

0.00- 

1.52 

0.19- 

2.23 

Between-run  repeatability 

2.21 

1.53 

1.17 

1.46 

3.43 

(%CV) 
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Table  14.  P.  ovale  curtisi  and  P.  ovale  wallikeri  reticulocyte  binding  protein  2  ( rbp2 ) 
DNA  sequences 


P.  ovale 
subspecies 


Rbp2  DNA  sequences  (5’-3’) 


P.  ovale  curtisi  CCACAGATAAGAAGTCTCAAGTACGATATTAATGAATTG 
CTAAGCGATATCAATTGCAAACAAAAGTGCTCCAA 


P.  ovale  CCACAGATAAGAAGTCTCAAGTACGATATTAATGAATTG 

wallikeri  CTAAGCGATATCATTTGCAAACAAAAGTGCTCCAA 
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P_  ova!e_  curtisi 
P_ovale_  wallikeri 

P_  ovale_  curtisi 
P_ovaie_  wallikeri 

P_ovaie_curtisi 
P_  ovaie_  wallikeri 


i  atBttggaaacaactgacaccttagattat attgatggt acagataacgagaaaaatataatttcccagttaaaaccagactactcttatBt  92 

1  ATGT  T  GG AAAC AAC  TGAC ACC  T  T  AG AT  T  AT  AT  TGATGGC AC AGAT  AACGAGAAA AAT  AT  A AT  T  T  CCC AGT  T  AAA ACC AGAC  T  ACGCTTATGT 92 

93  ATATTATTT  CAACGAAATTAAACGC  T  ATlC  AlQAAT  ATC ACAAAGAAAT  ATCTCCCAAAT  ATGA  AAGT  AT  AT  AT  AATTCT  A|C  ATC  AAAACCT  184 
93  AT ATTATTTCAAclAAATTAAACiCT AT AC AiAAT ATC ACAAA|AAATATCTTCCAAATATGAAAiT AT AT AT AATTCT A0C ATC AAAACCT  184 

185  T  AAAAiAAT AC AT  AG A AAATGC AGT CG A  T AC ATG  T AAACCT  AABAAAAATGAAATBAT  TpCTT  T A AC A AAA AT  T T  T  AG A AG AT  CC  TGAA A A A  276 
185  TAAAAGA AGAC AT A@A AAATGC AGTCGAT AC ATGT AAACCT A AGAAAAATGAAATG AT TGCTT TGAC AAAAATTTTAGAAGATCCTOAAAAA  276 


P_  ovaie_  curtisi 
P_  ovaie_  wallikeri 

P_ovaie_curtisi 
P_  ovaie_  wallikeri 

P_ovaie_curtisi 
P_  ovaie_  wallikeri 

P_  ovaie_  curtisi 
P_ovaie_  wallikeri 


277  ATTAAGGGACTT 
277  ATTAA^HACTT 

GAAGGACATT  ATGAAGGAAAACTT  CATGCATACAAAACATAT  ATGAAGGAAT  ATCAAAACT 
GAAGGGCATTATGAAGGAAAATTTCATGCATACAGAACATATATGAAGGAATATCAAAACT 

s 

TTTAATAAAT AAAAGC AA  368 
T  TT  AAT  AAATAAAAfjcAA  368 

369  TAAAACTATGCCACAGATAAbAAttTCTCAAGTACGATAfTAA 

369  taaaac|at|ccaca|ataa|aa|tctcaa|tac|atattaa 

TGAATTGCT  AAGCGAT  ATC 
TGAATTGCT  AAGCGAT  ATC 

l^A^ 

\TTT 

GCAAACAAAAGTGCT  CCAATGAAACTT 460 
1CAAACAAAa|t|cTCCAATOAAACCT  460 

461  AT  AAT  AAT  ATBaTTAABaT  AT  ATTT  ACT  AHAATTT  AAC  AATlT  ACClT  ACjjSAAAC  AC  AC  AT  CCAATATAT  AAApAACTTT  AAAA|TTCTT  T| 552 
461  ATAATAAT  AThATT  AAflAT  AT  AT  TT  ACT  A^AATTT  AAC AAt|t  ACCHT  AC|AAAC|C AC AT  CCAATATAT  AAABAAC ATT  AAAAHTTCTT  T® 552 


553  ■ACTTTGGAAATCATTTAATAAATAGGGTACAAAiA|AATTAjpjCAAAAATAATATTATA(ACTCAAC|AAATTTTTACAMAA|AAATAAA  644 
553  |aCTTTQGGA(|TCATTTAATAAATa|gGCACAAAGAhAATTAGGCAAAA|T AATATTATABACTCAAclAAATTTlTACAGGAA^AAATAAA 644 


P_ ova/e_ curtisi  645  ATATATTATA|AAA|ATTTTATACCCATTTA|ATAAA|TCAAATATMaAT*ATTATATAAAMaTCTTTCCAA|AAA|AA|TCCTAAAT|  736 
P_ ovaie_ wallikeri  645  ATATATTATAGAAAGATTTTATACCCATTTAGATAAaItCAAATATGGAAT&GATTATATAAAGGATCTTTCCAAGAAAGAAGITCCTAAAtI  736 


P_ovale_curtisi  737  AA|TT ACCATA(AT1tCCTTAAAAAT AATTATATTACiCT ACCTTACTATTACiCCC 
P_ova!e_  wallikeri  737  Aa|t1aCCTTA1At|tCCTTAAAAAT  AATTATATTACICT  ACCTTACTATTACiCCC 


793 

793 


Figure  7.  P.  ovale  reticulocyte  binding  protein  2  ( rbp2 )  sequence  alignment. 

The  P.  ovale  curtisi  (GU813971)  and  P.  ovale  wallikeri  (GU813972)  rbp2 
sequences  were  aligned  using  EMBL-EBI  Clustal  Omega  program  and  visualized 
in  Jalview  with  the  default  Jalview  nucleotide  color  scheme  (green  for  adenine, 
orange  for  cytosine,  red  for  guanine,  and  blue  for  thymine).  Primers  and  probe 
were  designed  based  on  conserved  regions  between  the  two  P.  ovale  subspecies. 
The  forward  (PoRBP2fwdl)  and  reverse  (PoRBP2revl)  primers  are  indicated  by 
arrows  and  the  hydrolysis  probe  (PoRBP2p)  binding  site  (boxed)  is  located  in 
between  the  forward  and  reverse  primer. 
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Figure  8.  P.  ovale  rbp2  qPCR  dynamic  range. 

A  ten-fold  serial  dilution  of  rbp2  plasmid  (1  to  100,000  copies/pl)  is  shown  in  the 
amplification  plot.  The  cycle  threshold  was  determined  automatically  by  the  ABI 
7500  fast  system  software  program.  The  negative  control  sample  (red  line)  shows 
no  amplification  over  the  cycle  threshold  for  60  cycles. 
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Log  rbp2  copy  number  per  microliter 


Figure  9.  P.  ovale  rbp2  plasmid  standard  curve. 

A  representative  standard  curve  demonstrates  linearity  based  on  10-fold  serial 
dilutions  (1  to  100,000  copies/pl)  of  rbp2  plasmid. 
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Figure  10.  P.  ovale  rbp2  qPCR  specificity. 

Serial  dilutions  of  rbp2  plasmid  were  spiked  with  P.  falciparum  DNA  (10,000 
parasites/ pi)  or  P.  vivax  DNA  (517  parasites/ pi).  Cq  values  were  unchanged  in  the 
presence  of  DNA  from  additional  malaria  parasite  species  (P=0.9993). 
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Microscopy  (log  parasites/|dl) 


Figure  1 1 .  Comparison  of  microscopy  and  P.  ovale  rbp2  qPCR  results. 

P.  ovale  parasitemias  based  on  microscopy  (log  paras  ites/pl)  were  compared  to 
rbp2  qPCR  results  (log  rbp2  copy  number/ pi).  A  limited  correlation  was  found 
between  parasitemia  and  rbp2  plasmid  copy  number  (R2=0.6595). 
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Abstract 


In  regions  of  stable  malaria  transmission,  human  hosts  can  be  infected  with 
multiple  strains  of  Plasmodium  falciparum.  The  number  of  unique  strains  present  within 
an  individual  is  referred  to  as  the  Complexity  of  Infection  (COI).  Multiclonal  (C0I>1)  P. 
falciparum  infections  can  influence  malaria  clinical  outcomes,  response  to  drug 
treatment,  and  within-host  parasite  dynamics.  In  this  study,  we  utilized  an  amplicon- 
based  deep  sequencing  approach  to  detect  minor  frequency  P.  falciparum  haplotypes  and 
estimate  the  COI  within  individual  samples  collected  as  part  of  the  2007  Demographic 
and  Health  Survey  in  the  Democratic  Republic  of  Congo  (DRC).  We  targeted  malaria 
positive  dried  blood  spot  samples  for  PCR  amplification  and  deep  sequencing  of  the 
polymorphic  P .falciparum  apical  membrane  antigen  1  (pfamal )  gene.  Deep  sequencing 
results  were  analyzed  using  the  SeekDeep  targeted  amplicon  analysis  pipeline.  We 
identified  at  total  of  88  unique  pfamal  haplotypes  and  found  64.5%  of  individuals  had 
multiclonal  (C0I>1)  P.  falciparum  infections.  We  found  no  difference  in  the  P. 
falciparum  COI  based  on  HIV  status,  age,  or  sex  and  no  geospatial  clustering  of  pfamal 
haplotypes  within  DRC  provinces  was  identified.  Overall,  high  sequence  diversity  within 
the  pfamal  gene  was  observed  in  P.  falciparum  parasites  from  the  DRC. 
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Introduction 


Complexity  of  Malaria  Infections 

Several  molecular  epidemiological  studies  have  demonstrated  that  multiple 
Plasmodium  falciparum  strains  circulate  in  malaria  holoendemic  regions  (31;  52;  104; 
138;  145;  178;  265).  The  number  of  genetically  distinct  P.  falciparum  strains  within  a 
single  infected  individual  is  referred  to  as  the  Complexity  of  Infection  (COI)  (44).  The 
intensity  of  malaria  transmission  corresponds  to  the  P.  falciparum  COI,  as  individuals 
living  in  malaria  holoendemic  regions  typically  have  higher  P.  falciparum  COIs 
compared  to  areas  with  seasonal  or  low  malaria  endemicity  (310).  Multiclonal  (C0I>1) 
infections  are  the  result  of  a  single  mosquito  inoculation  with  several  P.  falciparum 
strains  or  several  mosquito  inoculations  with  different  P.  falciparum  strains  (31;  86).  A 
range  of  COIs  have  been  described  in  the  literature,  with  the  experimental  methodology, 
geographic  location,  and  transmission  intensity  all  factoring  into  the  ability  to  harbor  or 
maintain  detectable  multiclonal  P.  falciparum  infections  (31;  41;  44;  134;  138;  178;  254; 
266).  The  majority  of  research  focused  on  malaria  COI  is  limited  to  P.  falciparum, 
however,  recent  studies  also  have  begun  to  elucidate  the  genetic  complexity  of 
multiclonal  P.  vivax  infections  (30;  75;  107;  169;  170). 

Detection  of  multiclonal  P.  falciparum  infections  is  important,  as  these  complex 
infections  have  been  shown  to  impact  clinical  outcomes,  indicate  malaria  transmission 
intensity,  and  influence  malaria  parasite  evolution  (26;  52;  72;  174;  206;  310).  Several 
studies  have  demonstrated  that  multiclonal  P.  falciparum  infections  are  associated  with 
increased  risk  of  clinical  malaria  (52;  184;  212;  246),  however,  other  studies  have  shown 
multiclonal  infections  protect  against  clinical  malaria  (26;  45;  96;  146;  199;  265). 
Differences  in  the  association  between  P.  falciparum  COI  and  clinical  malaria  may  be 
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influenced  by  malaria  transmission  intensity  (97).  Thus,  the  impact  of  P.  falciparum  COI 
on  clinical  malaria  outcome  remains  unclear  and  further  exploration  is  needed  to 
understand  the  within-host  dynamics  of  malaria  parasites  as  it  relates  to  disease  in 
humans  (104). 

Malaria  population  genetics  studies  have  recently  described  the  utility  of  P. 
falciparum  COI  as  a  marker  for  malaria  transmission  intensity  (310).  As  malaria 
transmission  falls,  due  to  seasonality  or  malaria  intervention  strategies,  the  P.  falciparum 
COI  in  a  population  has  been  shown  to  decrease  as  well  (72;  104;  306).  Therefore, 
detection  of  multiclonal  P.  falciparum  infections  and  surveillance  of  P.  falciparum  COI 
over  time  are  potential  useful  tools  for  measuring  malaria  transmission  before  and  after 
implementation  of  malaria  control  programs  (206). 

In  multiclonal  infections,  P.  falciparum  undergoes  genetic  recombination  during 
the  obligate  sexual  cycle  in  the  mosquito  vector  to  produce  genetically  distinct  offspring, 
facilitating  the  evolution  of  the  malaria  parasite  (132;  197).  Complex  multiclonal  malaria 
infections  are  therefore  a  potential  source  for  the  generation  of  novel  parasite  genotypes 
capable  of  evading  host  immune  responses  or  antimalarial  therapy  (120;  174) 

Factors  that  Influence  P.  falciparum  COI 

In  regions  with  high  malaria  transmission,  P.  falciparum  COI  is  typically  low  in 
young  children  (under  1  year)  (44;  184;  265),  increases  as  children  mature,  and  then 
decreases  when  individuals  reach  adulthood  (44;  184;  265).  The  relationship  between 
COI  and  age  in  malaria  endemic  regions  is  likely  due  to  the  development  of  naturally 
acquired  immunity  to  malaria  and  the  ability  of  these  semi-immune  individuals  to  control 
malaria  parasitemia  (310).  Malaria  vaccines  have  also  been  shown  to  reduce  malaria  COI 


88 


in  vaccinated  individuals  compared  to  unvaccinated  controls,  which  further  suggests  that 
the  development  of  malaria  immunity  aids  in  the  control  of  multiclonal  infections  (40; 
91). 

Pregnant  women  in  malaria  endemic  regions  have  higher  P.  falciparum  COIs 
when  compared  to  non-pregnant  women,  younger  pregnant  women  have  higher  COIs 
compared  to  older  pregnant  women  (254),  and  primigravid  pregnant  women  have  higher 
COIs  compared  to  multigravid  pregnant  women  (41;  159).  Further,  P.  falciparum  COI  in 
pregnancy  has  been  shown  to  impact  the  health  of  both  the  mother  and  child.  For 
example,  in  primigravid  women,  multiclonal  (C0I>1)  P.  falciparum  infections  in  the 
placenta  were  associated  with  low  birth  weight  babies  compared  to  monoclonal  (COI=l) 
placental  infections  (1 14).  A  study  in  Mozambique  found  that  pregnant  women  with 
higher  COIs  had  increased  prevalence  of  anemia  compared  to  pregnant  women  with 
lower  COIs  (41). 

Malaria  and  HIV  Co-infections 

Regions  endemic  for  both  malaria  and  HIV  pose  a  significant  challenge  for  the 
control  of  both  diseases.  Mathematical  models  estimate  3  million  cases  of  malaria  and 
65,000  malaria  deaths  can  be  attributed  to  HIV  in  Africa  annually  (151).  Furthermore, 
epidemiological  modeling  suggests  that  HIV  and  malaria  co-infections  fuel  the 
transmission  of  both  pathogens  in  endemic  regions  (23;  308).  The  impact  of  HIV  on 
clinical  malaria  depends  on  age,  level  of  immunosuppression,  pregnancy,  and  malaria 
transmission  intensity  (reviewed  in  (99)).  In  regions  holoendemic  for  malaria 
transmission,  co-infection  with  HIV  in  adults  is  associated  with  increased  prevalence  of 
clinical  malaria,  increased  malaria  parasite  density,  and  increased  prevalence  of  malaria 
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parasitemia  (100;  131;  137;  223;  287;  309;  316).  The  risks  of  clinical  malaria  and 
increased  malaria  parasitemias  in  HIV  infected  individuals  are  inversely  correlated  with 
CD4  T  cell  counts  (316). 

Evidence  of  the  impact  of  HIV  on  P.  falciparum  COI  is  limited.  A  study  in 
pregnant  women  in  Malawi  demonstrated  that  HIV  positive  women  had  significantly 
higher  P .falciparum  COIs  compared  to  HIV  negative  women  (152).  Alternatively,  a 
study  in  the  Central  African  Republic  (CAR)  demonstrated  that  HIV  seropositivity  was 
significantly  associated  with  lower  P.  falciparum  COIs  compared  to  HIV  seronegative 
individuals  during  clinical  malaria  episodes  (81).  The  impact  of  HIV  on  P.  falciparum 
COI  in  malaria  asymptomatic  individuals,  who  make  up  the  majority  of  malaria 
infections  in  endemic  regions  and  are  the  reservoir  for  malaria  transmission,  remains 
unclear. 

Methods  for  Determining  COI 

Numerous  genetic  tools  and  strategies  have  been  employed  to  analyze  malaria 
COI.  The  most  widely  utilized  method  for  determining  P.  falciparum  COI  is  PCR 
amplification  of  polymorphic  genes,  such  as  the  merozoite  surface  protein  genes  ( mspl , 
mspl)  and  Glurp  (52;  98;  145;  149;  254;  266).  These  PCR  based  methods  rely  on  DNA 
sequence  length  polymorphisms  of  the  target  gene,  which  are  visualized  via  gel 
electrophoresis  and  the  COI  within  an  individual  sample  is  calculated  by  counting  the 
number  of  distinct  bands.  However,  this  approach  limits  the  detection  of  distinct  P. 
falciparum  strains  differing  by  only  a  few  nucleotides  in  length  or  that  contain  single 
nucleotide  polymorphisms  (SNPs).  Further,  a  multicenter  comparison  of  related  PCR 
genotyping  methods  for  COI  demonstrated  high  variability  in  the  number  of  P. 
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falciparum  strains  detected  between  research  laboratories  due  to  differences  in  sample 
preparation  and  PCR  conditions  (95).  Therefore,  more  sensitive  and  standardized 
methods  for  the  detection  of  P.  falciparum  COI  will  improve  our  understanding  of  the 
impact  of  multiclonal  infections  on  malarial  disease  and  epidemiology. 

Novel  approaches  based  on  next-generation  sequencing  (NGS)  technologies  have 
increased  capabilities  to  detect  low  frequency,  or  minor  variant,  P.  falciparum  strains  and 
therefore  have  enhanced  capacity  to  characterize  the  COI  within  an  individual  or 
population  (31;  72;  104;  105;  134;  242).  These  methods  employ  NGS  technology  to 
generate  millions  of  sequence  reads  and  utilize  sophisticated  bioinformatics  tools  to 
analyze  large  data  sets.  NGS  has  several  advantages  over  conventional  PCR  and  shotgun¬ 
sequencing  for  detecting  low  frequency  P.  falciparum  infections  (134).  For  example, 

NGS  generates  millions  of  sequencing  reads  that  can  be  analyzed  to  detect  multiple  SNPs 
that  occur  at  different  frequencies  within  the  parasite  population  (129;  310).  NGS  is  also 
useful  for  estimating  the  frequency  or  proportion  of  different  P.  falciparum  strains  based 
on  sequence  read  frequencies  to  identify  major  and  minor  strains  present  within  a  single 
infection  or  population  (134).  Finally,  methods  for  barcoding  samples  and  the  decreasing 
costs  of  NGS  reagents  and  equipment  have  significantly  reduced  the  overall  price  of  NGS 
technology  for  large  malaria  population  genomic  and  epidemiological  studies.  In  this 
study,  we  utilized  a  NGS  deep  sequencing  approach  to  detect  multiclonal  P.  falciparum 
infections  and  analyze  P.  falciparum  COI. 

Apical  Membrane  Antigen  1  (AMA1) 

The  P.  falciparum  apical  membrane  antigen  1  (pfamal )  gene  encodes  the  83  kDa  type  I 
integral  membrane  AMA1  microneme  protein  (69;  125).  AMA1  is  expressed  in  the  last 
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four  hours  of  erythrocytic  development  in  the  dividing  schizont,  where  it  is  secreted  onto 
the  merozoite  surface  before  the  parasite  ruptures  the  host  erythrocyte  (202).  The  AMA1 
prodomain  is  cleaved  on  the  surface  of  the  merozoite,  resulting  in  a  66  kDa  membrane 
bound  AMA1  protein  (202).  AMA1  is  comprised  of  a  signal  sequence,  three  ectodomains 
(N-terminal  domain  I,  central  domain  II,  and  C-terminal  domain  III)  linked  by  eight 
disulfide  bonds  followed  by  a  transmembrane  domain  (Figure  12)  (125;  167;  201). 

AMA1  is  postulated  to  be  involved  in  host  cell  invasion,  as  AMA1  specific  antibodies 
block  the  ability  of  the  malaria  parasite  to  infect  host  erythrocytes  (148;  297).  However, 
studies  to  understand  the  function  of  P.  falciparum  AMA1  in  erythrocyte  invasion  are 
extremely  difficult,  as  pfamal  is  an  essential  gene  and  therefore  cannot  be  targeted  for 
knockout  experiments  (36;  122;  252;  302).  Several  studies  using  a  related  apicomplexan 
parasite.  Toxoplasma  gondii,  indicate  that  AMA1  is  important  for  host  cell  attachment 
and  formation  of  the  moving  junction  (MJ),  a  complex  of  proteins  involved  in  host  cell 
invasion  (28;  122;  194;  195).  AMA1  has  been  shown  to  interact  with  other  erythrocyte 
invasion  proteins  including  the  rhoptry  neck  2  (RON2)  protein,  which  is  also  involved  in 
MJ  formation  (271).  A  recent  study  demonstrated  that  conditional  expression  of  P. 
falciparum  AMA1  to  20%  compared  to  wild  type  levels  reduced  the  ability  of  P. 
falciparum  to  form  a  MJ  during  erythrocyte  invasion  and  reseal  the  erythrocyte  post 
invasion  (321).  Interestingly,  AMA1  is  also  expressed  on  the  surface  of  the  sporozoite 
and  may  facilitate  hepatocyte  invasion  (260). 

Individuals  routinely  exposed  to  malaria  develop  AMA1  specific  antibodies  that 
have  been  shown  to  increase  with  age  and  predominately  target  domains  I  and  II  (61;  68; 
236;  238;  298).  AMA1  specific  antibodies  protect  against  malaria  infection  and  clinical 
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disease  in  endemic  areas  (25;  236;  301).  In  addition  to  the  magnitude  of  AMA1  specific 
antibodies,  the  breadth  of  AMA1  specific  antibodies  against  several  P.  falciparum  strains 
has  also  been  identified  to  mitigate  the  risk  of  developing  malaria  clinical  disease  (33; 
217).  Natural  exposure  to  malaria  parasites  also  induces  an  AMA1  T-cell  specific 
response,  although  this  response  has  been  shown  to  be  short-lived  compared  to  AMA1 
specific  antibody  levels  (304). 

AMA1  remains  a  viable  malaria  vaccine  antigen  candidate,  despite  its  high  levels 
of  heterogeneity.  Several  AMA1  based  vaccine  challenge  studies  have  been  conducted 
including  different  adjuvants,  animal  models,  and  challenge  doses;  resulting  in  generally 
positive  clinical  outcomes  (Reviewed  in  (238)).  A  phase  II  clinical  trial  recently 
conducted  in  Mali  utilizing  an  AMA1  based  vaccine  showed  no  significant  reduction  in 
overall  clinical  malaria,  however,  the  authors  reported  a  significant  reduction  in  clinical 
malaria  caused  by  the  vaccine  strain  (270;  295).  These  results  demonstrate  that  protection 
against  clinical  malaria  based  on  AMA1  is  highly  strain- specific,  and  therefore 
circulating  P.  falciparum  strain  diversity  must  be  considered  in  order  to  develop  an 
effective  malaria  vaccine  based  on  the  AMA1  antigen.  To  increase  the  ability  of  AMA1 
based  vaccines  to  target  heterologous  P.  falciparum  strains,  several  efforts  have  been 
underway  to  develop  an  AMA1  based  vaccine  that  incorporates  AMA1  antigen  from 
multiple  strains  to  increase  protection  against  the  multitude  of  P.  falciparum  strains 
circulating  in  malaria  endemic  regions  (33;  85;  88;  294). 

The  1 .8  kilobase  (kb)  single  copy  pfamal  gene  is  located  on  chromosome  1 1  in 
the  P.  falciparum  genome.  The  pfamal  gene  is  highly  polymorphic,  containing  several 
single  nucleotide  polymorphisms  (SNPs),  the  majority  of  which  are  located  in  domain  I 
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(69;  87;  90;  93;  106;  147;  179;  227;  235;  281).  The  diversity  within  the pfamal  gene  is 
maintained  by  balancing  selection,  which  is  likely  due  to  immune  selection  within  the 
host  (30;  218;  235;  281).  Studies  in  malaria  endemic  regions  have  described  over  60 
polymorphic  sites  within  the  pfamal  gene  and  over  200  distinct  pfamal  haplotypes  were 
sequenced  from  a  malaria  endemic  region  in  Mali  (89;  281).  Based  on  the  highly 
polymorphic  nature  of  the  pfamal  gene  and  the  relevance  of  its  diversity  to  the 
development  of  an  AMA1  based  malaria  vaccine,  we  chose  to  target  pfamal  for  deep 
sequencing. 

Malaria  in  the  Democratic  Republic  of  Congo 

Malaria  is  a  leading  cause  of  morbidity  and  mortality  in  the  Democratic  Republic 
of  Congo  (DRC)  with  100%  of  a  population  of  over  79  million  people  considered  to  be  at 
risk  for  malaria  (12;  18;  19).  The  DRC  Ministry  of  Health  and  the  2014  President’s 
Malaria  Initiative  (PMI)  Malaria  Operation  Plan  for  DRC  reports  that  97%  of  individuals 
living  in  the  DRC  reside  in  regions  that  experience  malaria  transmission  from  8-12 
months  each  year  with  over  95%  of  malaria  infections  estimated  to  be  cause  by  P. 
falciparum  (13).  The  primary  malaria  vectors  in  the  DRC  include  Anopheles  gambiae 
sensu  stricto  and  An.  funestus  (50;  62;  63). 

Demographic  and  Health  Surveys  (DHS)  are  utilized  to  collect  data  to  identify 
health  and  social  needs,  implement  policies,  and  evaluate  and  monitor  programs.  The 
DHS  Program  ( w  w  w  .dhspro  gram  .com)  is  funded  by  USAID  along  with  several 
partnering  institutions  and  managed  by  ICF  International.  In  2007,  a  collaboratively 
funded  DHS  was  conducted  in  the  DRC  (EDS-RDC)  by  the  DRC  Ministry  of  Health  and 
Ministry  of  Planning  in  order  to  collect  cross-sectional  data  on  several  health  and  social 
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indicators.  The  survey  was  the  first  of  its  kind  in  the  DRC  and  included  questions 
concerning  family  planning,  nutrition,  mortality,  domestic  violence,  fertility,  maternal 
and  child  health,  demographic  information,  attitudes  towards  HIV/AIDS,  and  use  of 
insecticide- treated  bednets  (ITNs)  to  prevent  malaria  (2). 

The  EDS-RDC  results  indicated  that  ITN  usage  was  low  in  2007,  with  only  nine 
percent  of  households  owning  at  least  one  ITN  and  six  percent  of  children  under  five 
slept  under  an  ITN  the  night  before  the  survey  was  performed  (2).  Approximately  seven 
percent  of  pregnant  women  reported  sleeping  under  an  ITN  the  night  before  the  survey 
and  twelve  percent  of  pregnant  women  received  a  single  dose  of  intermittent  preventive 
treatment  (IPT)  during  pregnancy  (2).  Only  five  percent  of  pregnant  women  reported 
receiving  the  recommended  two  doses  of  IPT  during  pregnancy  (2). 

Blood  samples  collected  for  HIV  testing  during  the  2007  EDS-RDC  were  also 
utilized  to  study  malaria  prevalence  and  epidemiology  in  the  DRC  (291).  Using  a  panel 
of  real-time  PCR  (qPCR)  assays,  malaria  prevalence  was  reported  as  33.5%  in  adults  (age 
15-59  years)  with  the  vast  majority  (>90%)  of  infections  were  P.  falciparum  either  as  a 
monoinfection,  or  as  a  co-infection  with  P.  ovale  or  P.  malariae  (291).  Multivariate 
analysis  based  on  data  collected  as  part  of  the  2007  EDS-RDC  identified  variables  that 
were  significantly  related  to  malaria  prevalence  and  showed  males  to  be  24%  more  likely 
to  have  malaria  parasitemia.  Additionally,  individuals  living  further  from  urban  areas  had 
higher  malaria  prevalence  and  wealthier  individuals  had  a  decreased  risk  of  malaria 
prevalence  (191).  Malaria  positive  samples  collected  as  part  of  the  2007  EDS-RDC  will 
be  utilized  in  this  study  for  deep  sequencing  of  the  pfamal  gene  and  analysis  of  P. 
falciparum  COI  based  on  demographic  factors  and  geographical  locations  in  the  DRC. 
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Overall  Study  Aim 


The  aim  of  this  study  is  to  utilize  a  deep  sequencing  approach  to  detect 
multiclonal  P.  falciparum  infections  from  blood  samples  collected  as  part  of  the  2007 
EDS-RDC.  To  address  this  aim,  we  will  1)  utilize  a  sensitive,  amplicon-based  deep 
sequencing  approach  targeting  the  polymorphic  pfamal  gene  to  detect  low  frequency  and 
multiclonal  P.  falciparum  infections  and  2)  analyze  P.  falciparum  Complexity  of 
Infection  (COI)  based  on  demographic  factors  including  HIV  status,  geographic  location, 
age,  and  sex  in  the  DRC. 

Methods 

EDS-RDC  Methods  and  Sample  Collection 

The  2007  EDS-RDC  was  performed  using  a  stratified  sampling  method  in  which 
each  of  the  1 1  provinces  was  separated  into  three  strata:  major  cities,  towns,  and  rural 
areas  (187).  Census  data  were  utilized  to  identify  neighborhoods  in  urban  areas  and 
villages  in  rural  areas  (187).  300  villages  and  neighborhoods,  referred  to  as  clusters,  were 
identified  randomly  to  represent  urban  (41%  of  the  population)  and  rural  areas  according 
to  census  data  (187).  The  300  clusters  contained  9,000  households,  of  which  99.3%  were 
successfully  interviewed  for  the  2007  EDS-RDC  (2).  Data  collection  for  the  EDS-RDC 
was  conducted  from  January  to  March  and  also  from  May  to  August  2007  (2).  The  survey 
included  9,995  women  (age  15-49)  and  4,757  men  (age  15-59)  from  each  of  the  11  DRC 
provinces,  representing  both  urban  and  rural  areas  (2).  To  protect  the  privacy  of  the 
participants,  geographical  coordinates  were  randomly  displaced  by  5  kilometers  in  rural 
areas  and  2  kilometers  in  urban  areas  (187). 
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Blood  samples  were  obtained  via  filter  paper  for  voluntary  HIV  testing  from  all 
the  men  and  half  of  the  women  included  in  the  survey  (2;  187).  The  HIV  testing  protocol 
included  an  initial  ELISA  test  (Vironostika)  and  subsequent  retesting  of  all  positive 
ELISA  samples  and  10%  of  negative  ELISA  samples  with  a  second  ELISA  test 
(Enzygnost)  (3,  Purvis,  2015  #3152).  Samples  with  discordant  results  from  the  first  and 
second  ELIS  As  were  retested  via  Western  blot  (3).  The  2007  EDS-RDC  reports  HIV 
prevalence  as  1.3%  in  individuals  aged  15-59  years  (2).  Both  men  and  women  living  in 
urban  areas  showed  a  higher  HIV  prevalence  compared  to  rural  areas  (2).  In  women,  HIV 
prevalence  was  highest  in  the  most  educated  and  most  wealthy  compared  the  least 
educated  and  least  wealthy  (2). 

DNA  Extraction  Methods 

Remaining  dried  blood  spots  collected  for  HIV  testing  were  utilized  for  detection 
of  malaria  parasitemia  (291).  Sample  collection,  storage,  and  extraction  methods  are 
published  elsewhere  (291).  Briefly,  genomic  DNA  was  extracted  from  blood  spots  and 
utilized  as  template  for  a  Plasmodium  genus- specific  18S  ribosomal  RNA  based  qPCR 
assay  and  additional  species-specific  18S  ribosomal  RNA  based  qPCR  assays  to  detect  P. 
falciparum,  P.  ovale,  and  P.  malariae  (289;  291). 

Individual  Sample  Identification 

For  this  study,  we  identified  HIV  positive  samples  (n=23)  that  were  also  positive 
for  P.  falciparum  but  negative  for  all  other  malaria  parasite  species  based  on  the  18S 
species-specific  qPCR.  We  then  matched  the  23  double  positive  (HIV/P.  falciparum) 
samples  to  P.  falciparum  positive,  HIV  negative  samples  based  on  age  and  sex.  To 
account  for  the  low  number  of  double  positive  samples,  matching  was  done  on  a  4: 1  ratio 
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of  HIV  negative  (n=92)  to  HIV  positive  samples  (n=23)  for  a  total  of  1 15  individual 
samples.  A  sample  size  of  1 15  samples  has  81%  power  to  detect  a  mean  COI  difference 
of  1.1  between  HIV  positive  and  HIV  negative  individuals,  based  on  a  two-sided  95% 
confidence  interval. 

Geographical  Clusters  Identification 

To  analyze  geospatial  differences  of  P.  falciparum  haplotypes,  we  selected  88 
clusters  from  the  original  300  clusters  identified  in  the  2007  EDS-RDC  that  were  the 
same  or  geographically  close  to  the  clusters  from  which  the  individual  samples  were 
identified.  We  combined  DNA  samples  to  generate  a  pooled  “geographical  cluster 
sample”  that  contained  the  DNA  from  all  samples  identified  in  that  particular  cluster 
(n=88).  The  number  of  samples  that  were  pooled  for  each  geographical  cluster  sample 
ranged  from  1-25  samples.  We  utilized  these  geographical  cluster  samples  in  order  to 
compare  the  pfamal  haplotypes  found  in  individual  samples  to  pfamal  haplotypes  found 
in  the  same  or  surrounding  geographical  clusters.  Additionally,  pooling  samples  from  a 
particular  geographical  cluster  reduces  the  cost  and  labor  of  sequencing  each  sample 
individually. 

P.  falciparum  lactate  dehydrogenase  qPCR 

To  confirm  the  presence  of  P.  falciparum,  all  individual  samples  (n=l  15)  were 
screened  for  the  P.  falciparum  lactate  dehydrogenase  (pfldh)  gene  by  qPCR  on  the  ABI 
Viia7  platform  (Life  Technologies).  Primer/probe  sequences  and  PCR  cycling  conditions 
are  described  elsewhere  (233).  Approximate  pfldh  concentrations  were  calculated  for 
individual  samples  using  a  standard  curve  generated  from  known  concentrations  of  P. 
falciparum  DNA. 
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P.  falciparum  apical  membrane  antigen  1  (pfamal )  PCR 

Heminested  primers  were  designed  using  Primer3  (305)  to  amplify  a  region  of  the 
pfamal  gene  (GenBank  Reference  XM  001347979.1).  A  heminested  approach  was  used 
to  increase  sensitivity  for  detection  of  low-level  concentrations  of  P.  falciparum  DNA. 
The  AmalOF  and  AmalR  primers  amplify  a  266  bp  region  and  the  AmalF  and  AmalR 
primers  amplify  a  236  base  pair  (bp)  region  (Table  15  and  Figure  12).  A  10  nucleotide 
Multiplex  Identifier  (MID)  barcode  sequence  was  added  to  the  5’  end  of  the  AmalF 
primer  to  allow  for  pooling  of  individual  sample  PCR  products  during  library  preparation 
(9).  Twenty-two  unique  MID  sequences  were  utilized  (Table  16). 

Following  a  heminested  PCR  approach,  an  initial  PCR  (round  1)  was  performed 
and  the  PCR  product  utilized  as  template  for  a  second  PCR  (round  2).  For  round  1  PCR, 
AmalOF  and  AmalR  primers  were  used.  For  round  2  PCR,  AmalF  and  AmalR  primers 
were  used.  Details  for  the  pfamal  nested  PCR  and  cycling  conditions  are  shown  in  Table 
15.  PCR  products  were  visualized  on  1%  agarose  gels  stained  with  ethidium  bromide. 
Positive  PCR  products  were  purified  using  the  Invitrogen  PureLink  Pro  96  PCR 
Purification  Kit  and  purified  PCR  product  concentration  was  calculated  in  duplicate  using 
the  Invitrogen  Quant- iT  PicoGreen  dsDNA  Assay  Kit  according  to  manufacturer’s 
instructions. 

Pfamal  Amplicon-based  Deep  Sequencing 

Purified  PCR  products  from  individual  samples  and  geographical  cluster  samples 
were  organized  into  18  indexes  such  that  each  index  contained  PCR  products  with  unique 
MIDs.  We  added  ten  nanograms  of  each  PCR  product  to  the  appropriate  index.  The  DNA 
concentration  of  each  of  the  18  indexes  was  analyzed  using  the  Agilent  High  Sensitivity 


99 


D1000  ScreenTape  Assay  on  the  2200  TapeStation  (Agilent  Technologies)  according  to 
manufacturer’s  instruction  to  ensure  DNA  concentration  between  indexes  was  similar. 

Ion  Torrent  library  preparation  for  amplicon-based  deep  sequencing  was 
performed  following  the  “Preparing  Short  Amplicon  (<350)  Libraries  Using  the  Ion  Plus 
Fragment  Library  Kit”  manual  (Life  Technologies,  MAN0006846,  revision  3.0).  After 
library  preparation,  we  determined  the  DNA  concentration  of  each  of  the  18  libraries 
using  the  Agilent  High  Sensitivity  D1000  ScreenTape  Assay  according  to  manufacturer’s 
protocol.  Equal  concentrations  of  each  library  were  subsequently  pooled  together  and 
spilt  across  two  Ion  318  Chips  (Life  Technologies)  utilizing  400bp  chemistry  on  the  Ion 
Torrent  PGM  platform  (Life  Technologies)  at  the  University  of  North  Carolina  Chapel 
Hill  Microbiome  Core  Facility. 

SeekDeep  Bioinformatics  Pipeline 

We  utilized  the  SeekDeep  targeted  amplicon  bioinformatics  pipeline  for  sequence 
data  extraction,  processing,  and  analysis.  SeekDeep 

(http://baileylab.umassmed.edu/SeekDeep)  was  developed  by  Nicholas  Hathaway  and  Dr. 
Jeffrey  Bailey  at  the  University  of  Massachusetts  School  of  Medicine  (169).  The  pipeline 
consists  of  three  steps:  extractor,  qluster,  and  processClusters.  An  overview  of  the 
SeekDeep  process  is  shown  in  Figure  13.  The  first  step  is  extractor,  in  which  the 
sequence  data  is  “de-multiplexed”  such  that  samples  are  identified  and  separated  based 
on  MID  barcodes.  The  extractor  step  also  applies  several  quality  control  steps,  including 
removing  short  sequence  reads  and  reads  with  poor  quality  scores.  The  second  step, 
qluster,  filters  and  compiles  the  raw  sequence  reads  to  separate  unique  haplotypes 
through  an  iterative  process.  The  last  step,  processClusters,  compares  haplotypes  found  in 
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replicate  samples,  performs  additional  filtering  of  poor  quality  and  chimeric  reads,  and 
generates  the  final  population  of  haplotypes.  Output  files  were  analyzed  in  the 
popClusteringViewer,  which  is  generated  in  the  processClusters  step  and  creates  a  local 
HTTP  server  based  interactive  site  to  view  sequencing  results. 

Pfamal  Haplotype  Analysis 

DnaSP  (v5.10.1)  was  used  for  analysis  of  nucleotide  polymorphisms  and 
haplotype  diversity  (166;  251).  Population  pairwise  (Fsx)  comparisons  were  determined 
between  DRC  provinces  using  Analysis  of  Molecular  Variance  (AMOVA)  tool  in  the 
Arlequin  (v3.5.2.2)  population  genetics  data  analysis  program  (94).  We  used  the  program 
Network  (version  4.613)  along  with  the  DNA  Alignment  (v  1.3 .3 .2)  and  Network 
Publisher  (v2.0.01)  add-ons  ( w w w .fluxus-en gineering .com)  to  generate  a  median-joining 
(MJ)  network  diagram  for  visualization  of  phylogenetic  relationships  between  pfamal 
haplotypes  (34).  Estimates  (v9.1.0)  was  utilized  for  developing  rarefaction  curves  (66). 
Phylogenetic  analysis  was  performed  using  MEGA  version  6  (284). 

Statistical  Analyses  and  Data  Visualization 

Microsoft  Excel,  GraphPad  Prism  (v6),  and  SPSS  (v22)  were  used  for  statistical 
analyses.  Maps  were  created  in  ArcGIS  ArcMap  (ESRI,  v.10.2.2).  The  DRC  province 
boundary  map  was  obtained  from  the  Spatial  Data  Repository,  part  of  the  DHS  Program 
(5). 
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Results 


P.  falciparum  lactate  dehydrogenase  qPCR 

All  individual  samples  (n=l  15)  were  tested  for  P.  falciparum  lactate 
dehydrogenase  (pfldh)  by  qPCR.  Of  the  115  individual  samples,  99  were  positive  for 
pfldh  (Table  17).  The  concentration  of  pfldh  ranged  from  less  than  0.1  ng/ml  to  over 
1,000  ng/ml  (Table  17)  based  on  standard  curve  analysis.  Approximately  56%  (n=55)  of 
samples  that  were  positive  for  pfldh  were  found  at  concentrations  of  less  than  0.1  ng/ml. 

Pfamal  Conventional  PCR 

P.  falciparum  amal  (pfamal )  specific  conventional  PCR  was  performed  on  all 
individual  samples  (n=115)  and  geographical  cluster  samples  (n=88). 

Pfamal  Results  for  Individual  Samples 

For  the  individual  samples,  12  out  of  the  16  (75%)  samples  that  were  negative  for 
the  pfldh  qPCR  were  also  negative  with  the  pfamal  PCR  (Table  17),  suggesting  that  P. 
falciparum  nucleic  acid  was  absent  or  below  the  limit  of  detection  for  these  assays.  Four 
samples  that  were  negative  for  pfldh  were  positive  for  pfamal  (Table  17).  Twenty-one 
samples  were  positive  for  pfldh  but  negative  for  pfamal  and  16  of  these  pfamal  negative 
samples  had  pfldh  concentrations  <0.1  ng/ml.  Therefore,  low  P.  falciparum  nucleic  acid 
concentrations  may  limit  the  detection  ability  of  the  pfamal  PCR  assay.  Overall,  81  of 
the  115  (70.4%)  individual  samples  were  positive  for  pfamal  and  subsequently  utilized 
for  downstream  deep  sequencing  reactions. 
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Pfamal  Results  for  Geographical  Cluster  Samples 

Of  the  88  pooled  geographical  cluster  samples,  82  were  successfully  amplified 
using  the  pfamal  PCR  assay  and  utilized  for  downstream  deep  sequencing  reactions. 

Pfamal  Amplicon-based  Deep  Sequencing 

We  successfully  sequenced  79  (68.7%)  of  the  individual  samples,  76  (86.4%)  of 
the  pooled  geographical  cluster  samples,  and  six  sequencing  controls  (n=161).  We 
employed  a  conservative  2.5%  haplotype  cutoff,  meaning  we  included  pfamal  haplotypes 
generated  from  sequence  reads  that  occurred  at  a  frequency  of  >2.5%  within  a  given 
sample.  A  2.5%  cutoff  was  chosen  in  order  to  limit  the  inclusion  of  nonspecific  and 
chimeric  sequences  in  the  dataset.  Using  the  stringent  2.5%  cutoff,  we  obtained 
3,739,195  sequencing  reads  with  an  average  sequence  length  of  approximately  195  bp. 
We  estimated  the  average  number  of  reads  per  individual  sample/geographical  cluster 
sample  to  be  23,225  reads  (3,739,195  total  reads  /161  samples  =  23,225  reads). 

Therefore,  based  on  a  2.5%  cutoff,  approximately  581  sequencing  reads  were  used  to 
construct  the  sequence  of  minor  frequency  haplotypes  (23,225  x2.5%  =  581  reads). 
Rarefaction  curve  analysis  using  the  2.5%  cut-off  indicates  that  the  inclusion  of  more 
samples  would  likely  result  in  the  detection  of  additional  haplotypes,  as  we  have  not  yet 
reached  the  asymptote  (Figure  14).  We  identified  a  total  of  88  unique  pfamal  haplotypes 
from  both  the  individual  samples  and  the  pooled  geographical  cluster  samples. 

Several  additional  quality  control  methods  were  utilized  to  reduce  background 
and  limit  inclusion  of  sequencing  artifacts.  First,  all  individual  samples  and  geographical 
cluster  samples  were  PCR  amplified  and  sequenced  in  duplicate.  Sequencing  results  for 
each  duplicate  were  then  compared  to  ensure  the  haplotypes  matched  in  sequence  and 
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fraction  between  duplicates.  Second,  a  high  fidelity  PCR  polymerase  was  utilized  to 
reduce  PCR  error  and  generation  of  SNPs  during  PCR  cycles.  Finally,  in  order  to  reduce 
sequencing  bias  during  library  preparation  (i.e  preferential  amplification  of  dominant 
haplotypes  within  pooled  samples),  equal  concentrations  of  PCR  products  from 
individual  and  geographical  cluster  samples  were  calculated  and  added  during  library 
preparation  steps. 

A  sequencing  quality  control  sample  was  generated  by  mixing  known 
concentrations  of  DNA  from  several  P.  falciparum  strains  and  subsequently  used  as 
template  in  duplicate  for  three  separate  pfamal  PCRs  followed  by  library  preparation  and 
deep  sequencing.  The  sequencing  control  contained  P.  falciparum  DNA  from  Vis,  Ro33, 
dd2, 7g8,  and  K1  strains  at  5, 10, 15, 30,  and  40  percent,  respectively.  As  shown  in 
Figure  15,  the  six  sequencing  controls  show  the  expected  haplotypes  at  approximately  the 
same  frequencies. 

Individual  Samples 

For  the  individual  samples  (n=79),  a  total  of  68  haplotypes  were  identified. 
Demographic  characteristics  for  the  79  individual  samples  are  shown  in  Table  18.  The 
majority  of  individual  samples  showed  multiclonal  P.  falciparum  infections  (64.5%) 
defined  as  a  C0I>1.  The  mean  COI  for  individual  samples  was  2.43  and  ranged  from  1  to 
9  haplotypes  (Table  9).  Figure  16  shows  the  geographical  location  of  the  79  individual 
samples  within  the  DRC.  The  size  of  the  pie  chart  reflects  the  COI  of  the  individual  while 
the  colors  within  the  pie  chart  represent  the  particular  haplotype  fractions  within  a  single 
individual.  No  clustering  of  particular  pfamal  haplotypes  was  observed  based  on 
geographical  location  in  Figure  16. 
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No  differences  in  P.  falciparum  COI  estimated  by  pfamal  haplotypes  were 
observed  based  on  age  (Figure  17),  sex  (Figure  18),  and  HIV  status  (Figure  19).  We 
utilized  the  median  (COI=2)  as  the  cut  off  to  identify  individual  samples  with  a  high  COI 
(COI>2)  versus  a  low  COI  (COI<2).  Two  by  two  contingency  analysis  based  on  HIV 
status  and  a  high  COI  versus  low  COI  showed  no  difference  (two-tailed  t-test  P=0.7955). 
A  logistic  regression  analysis  to  estimate  the  odds  ratio  showed  no  significant  impact  of 
age,  sex,  or  HIV  status  on  high  COI  or  low  COI  (Figure  20).  We  also  used  linear 
regression  to  analyze  COI  as  a  continuous  variable  and  also  found  no  impact  of  age,  sex, 
or  HIV  status  based  on  COI.  As  elevation  has  been  shown  to  influence  malaria 
transmission  based  on  suitability  for  maintaining  mosquito  vector  habitat  (183;  293),  we 
extrapolated  elevation  data  based  on  geographical  locations  of  individual  samples.  No 
relationship  was  observed  between  COI  and  elevation  (Figure  21). 

To  analyze  COI  differences  between  the  1 1  DRC  provinces,  we  averaged 
individual  COIs  within  each  province  (Figure  22A,  Table  19).  We  also  compared  average 
COI  within  a  particular  province  with  the  percent  of  samples  possessing  a  monoclonal 
(COI=l)  versus  percent  polyclonal  (C0I>1)  P.  falciparum  infection  (Figure  22B).  Kasai- 
Oriental  province  had  the  highest  average  COI  (COI=3.5),  while  Bandundu,  Nord-Kivu, 
and  Orientale  had  COIs>2.5  (Figure  22A,  Table  19).  As  several  studies  have  shown  that 
higher  COIs  coincide  with  regions  of  high  malaria  prevalence  (72;  310),  we  compared 
individual  COI  with  malaria  prevalence  based  on  data  from  the  2007  EDS-RDC  samples 
reported  in  a  separate  study  (291).  However,  we  did  not  find  a  significant  correlation 
between  COI  and  malaria  prevalence  (Figure  23). 
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Pooled  Geographical  Cluster  Samples 

Sixty-two  haplotypes  were  identified  in  the  pooled  geographical  cluster  samples 
(n=76)  and  82.9%  were  polyclonal  (C0I>1).  Comparison  of  haplotypes  found  in  the 
same  geographical  cluster  or  nearby  geographical  cluster  between  the  individual  samples 
and  the  pooled  geographical  cluster  samples  revealed  that  unique  haplotypes  were  found 
infecting  individuals  that  were  not  found  in  the  same  or  nearby  pooled  geographical 
cluster  sample.  Further,  we  identified  27  unique  pfamal  haplotypes  in  individual  samples 
that  were  not  identified  in  the  pooled  geographical  cluster  samples  and  similarly,  we 
identified  20  unique  pfamal  haplotypes  in  the  geographical  cluster  samples  that  we  did 
not  find  in  individual  samples.  These  data  suggest  that  there  is  no  geospatial  clustering  of 
particular  pfamal  haplotypes  within  our  particular  sample  set  from  the  DRC. 

Population  Genetics  Analyses 

Several  population  genetics  analysis  methods  were  utilized  to  investigate  the 
pfamal  haplotype  diversity  within  individual  samples  in  the  DRC  (Table  19).  There  were 
a  total  of  38  polymorphic  sites  (S)  within  the  targeted  region  of  pfamal  across  all  79 
individual  samples.  We  analyzed  the  nucleotide  diversity  (jt),  which  is  a  measure  of 
polymorphism  based  on  the  average  nucleotide  differences  per  site  between  randomly 
chosen  sequences  (204).  Nucleotide  diversity  was  similar  between  HIV  negative  and  HIV 
positive  individuals,  and  across  different  DRC  provinces.  We  also  determined  haplotype 
diversity  (Hd),  another  measure  of  population  diversity,  that  calculates  the  probability 
that  two  randomly  selected  haplotypes  in  a  population  are  different  (117;  203).  For  all 
individual  samples,  the  Hd  was  0.998  (Table  19),  indicating  high  haplotype  diversity.  We 
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compared  Hd  between  haplotypes  identified  based  on  HIV  status  and  DRC  province  and 
found  high  Hd  across  all  groups  (Table  19). 

To  investigate  geospatial  haplotype  diversity,  we  compared  the  population 
fixation  index  (Fsx)  between  DRC  provinces  from  individual  samples.  Fsx  is  a  measure  of 
the  interpopulation  heterogeneity  based  on  allele  frequencies  and  ranges  between  0  and  1 , 
indicating  random  mating  (panmixis)  and  isolated  populations,  respectively  (Table  20). 
The  average  Fsx  value  is  0.009,  ranging  from  0  to  0.04312.  The  low  Fsx  values  suggest 
that  pfamal  haplotypes  are  not  restricted  based  on  DRC  province. 

To  further  explore  whether  certain  pfamal  haplotypes  or  related  groups  of 
haplotypes  dominated  in  geographical  regions  within  the  DRC,  we  constructed  a  Median- 
Joining  Network  Diagram  using  the  68  haplotypes  found  in  individual  samples  (Figure 
24).  Our  Network  Diagram  analysis  showed  no  clear  clustering  of  related  pfamal 
sequences  based  on  DRC  province,  but  instead  illustrates  the  diversity  of  pfamal 
haplotypes  found  throughout  the  DRC.  We  also  constructed  a  phylogenetic  tree  from  the 
68  pfamal  haplotypes  from  individual  samples  to  further  explore  the  relatedness  between 
haplotypes  (Figure  25). 

Discussion 

This  study  shows  the  utility  of  amplicon-based  deep  sequencing  and  the 
SeekDeep  analysis  pipeline  to  detect  minor  variants  and  analyze  multiclonal  P. 
falciparum  infections.  Using  sensitive  deep  sequencing  technology,  we  showed 
widespread  sequence  polymorphism  of  pfamal  in  P.  falciparum  parasites  collected  from 
asymptomatic  individuals  in  the  DRC.  We  also  found  that  the  majority  of  P.  falciparum 
infections  in  the  DRC  are  multiclonal  (COI>l).  We  report  no  association  between  several 
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demographic  factors,  including  age,  sex,  elevation,  and  HIV  status,  on  the  P.  falciparum 
COI  within  individuals. 

We  also  explored  the  geospatial  distribution  of  pfamal  haplotypes  within  the 
DRC  and  found  no  spatial  clustering  of  haplotypes.  Our  findings  are  in  agreement  with 
another  recent  study  that  describes  the  population  genetics  of  P.  falciparum  within  the 
DRC  as  a  “complex  and  fragmented  landscape”(60).  These  results  suggest  that  P. 
falciparum  parasites  are  not  restricted  based  on  geography  within  the  DRC,  but  instead 
are  likely  moving  along  with  their  human  hosts  between  provinces  and  neighboring 
countries  (60).  Other  possible  explanations  for  the  geospatial  diversity  of  pfamal 
haplotypes  in  the  DRC  include  movement  of  infected  mosquito  vectors  between 
provinces  and  immune  selection  that  maintains  extensive  malaria  antigenic  diversity 
within  the  DRC  (282). 

Selection  of  amplicon-based  deep  sequencing  of  pfamal  for  detection  of 
multiclonal  P.  falciparum  infections  and  COI  analyses  was  based  on  several  factors. 
Amplicon-deep  sequencing  is  a  cost  effective  method  for  targeted  sequencing  that  results 
in  very  high  coverage  compared  to  whole  genome  sequencing,  which  can  be  more  labor 
intensive  and  expensive.  Additionally,  using  our  amplicon-based  deep  sequencing 
approach  we  were  able  to  pool  several  dozen  samples,  which  also  reduced  overall  cost 
and  allowed  us  to  analyze  a  larger  sample  size.  Pfamal  was  chosen  as  a  deep  sequencing 
target  for  several  reasons  including  1)  prior  success  of  deep  sequencing  pfamal  from 
dried  blood  spot  samples,  2)  pfamal  is  a  single  copy  gene,  3)  fewer  repetitive  sequences 
are  present  in  pfamal  compared  to  other  targets  for  COI  analysis  that  can  cause 
sequencing  errors,  and  4)  relevance  of  pfamal  haplotype  diversity  to  malaria  AMA1 
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based  vaccine  development.  We  chose  the  Ion  Torrent  PGM  (Life  Technologies)  as  our 
deep  sequencing  platform  because  of  the  availability  at  the  University  of  North  Carolina 
Chapel  Hill,  prior  experience  of  laboratory  personnel,  and  reduced  cost  compared  to  other 
deep  sequencing  platforms. 

Our  study  has  several  limitations.  First,  despite  the  sensitivity  of  deep  sequencing 
for  detecting  minor  variants  within  a  population,  we  likely  missed  P.  falciparum 
haplotypes  due  to  one  or  more  of  the  following:  1)  degradation  of  P.  falciparum  nucleic 
acid  in  dried  blood  spots  2)  sequence  polymorphisms  in  the  pfamal  primer  binding 
regions,  3)  haplotype  frequency  below  the  limit  of  detection  of  pfamal  PCR,  and  4) 
haplotype  sequence  frequency  less  than  the  2.5%  cutoff  for  SeekDeep  analysis.  Second, 
we  analyzed  a  subset  of  the  total  P.  falciparum  positive  samples  collected  from  the  2007 
EDS-RDC  and  therefore  may  have  limited  our  ability  to  detect  significant  differences  in 
COI  based  on  age,  sex,  and  HIV  status.  Third,  our  analyses  of  COI  and  P.  falciparum 
population  genetics  are  restricted  to  asymptomatic  malaria  infections  in  the  DRC.  The 
inclusion  of  symptomatic  malaria  infections  might  have  provided  additional  information 
on  the  pfamal  haplotypes  that  cause  clinical  disease  or  the  relationship  between  COI  and 
disease  in  the  DRC.  Finally,  we  used  an  amplicon-based  deep  sequencing  approach 
based  on  pfamal  as  a  marker  to  identify  different  P.  falciparum  strains.  Although  similar 
approaches  have  been  used  previously  to  determine  multiclonal/multistrain  malaria 
infections  (72;  134;  169),  this  method  assumes  that  genetic  polymorphisms  within  the 
single  copy  pfamal  reflect  genetically  distinct  P.  falciparum  strains.  As  the  P. 
falciparum  genome  encompasses  over  5,000  protein-coding  genes,  the  use  of  pfamal  as  a 
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marker  of  genetic  diversity  likely  underestimates  the  true  genomic  diversity  of  P. 
falciparum  strains  circulating  in  the  DRC. 

Somewhat  unexpectedly,  we  found  no  significant  correlation  between  P. 
falciparum  prevalence  and  COI.  However,  our  inability  to  detect  a  positive  correlation 
between  P  falciparum  COI  and  malaria  prevalence  may  be  due  to  the  small  sample  size 
included  in  this  study  or  due  to  the  exclusion  of  minor  frequency  haplotypes  below  the 
2.5%  cutoff.  Since  we  did  not  include  all  malaria  positive  samples  from  the  2007  EDS- 
RDC  in  this  current  study,  inclusion  of  additional  samples  for  COI  analysis  could 
potentially  improve  the  correlation  between  P.  falciparum  prevalence  and  COI.  Although 
several  studies  have  recently  described  the  significant  relationship  between  COI  and 
malaria  prevalence  (72;  104;  206;  306;  310),  our  data  indicate  that  additional  research  is 
needed  to  confirm  this  relationship  in  the  DRC. 

Despite  high  malaria  prevalence,  relatively  few  studies  have  been  conducted  to 
understand  malaria  epidemiology  and  population  genetics  in  the  DRC  (60;  187-189;  191; 
288;  291;  292;  317).  Our  results  have  several  implications  for  future  malaria 
epidemiology  studies  in  the  DRC.  First,  AMA1  is  a  highly  studied  malaria  vaccine 
candidate  and  is  currently  undergoing  clinical  trials  in  other  malaria  endemic  regions  (4; 
270;  302;  304).  As  several  studies  have  now  shown,  AMA1  based  vaccines  are  effective 
against  P.  falciparum  strains  that  have  the  same  (homologous)  or  highly  similar  pfamal 
alleles  (68;  85).  We  identified  a  total  of  88  pfamal  haplotypes  and  provide  evidence  of 
the  extensive  pfamal  diversity  present  within  the  DRC  that  might  be  considered  for  the 
identification  of  appropriate  P.  falciparum  haplotypes  to  include  in  an  AMA1  based 
vaccine  in  the  DRC  and  surrounding  countries.  Understanding  the  pfamal  haplotype 
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diversity  in  a  malaria  endemic  region  is  critical  for  developing  an  AMA1  based  vaccine 
that  would  successfully  protect  against  disease  from  circulating  endemic  P.  falciparum 
strains  (68;  85;  238;  281;  294;  295). 

Second,  the  2007  EDS-RDC  samples  were  collected  prior  to  the  scale-up  of 
malaria  intervention  strategies  aimed  at  reducing  malaria  transmission  and  disease  in  the 
DRC.  The  DRC  became  a  PMI  focus  country  in  2010  and  has  since  seen  an  increase  in 
intervention  strategies  to  reduce  malaria  (12;  13).  Five  “key  intervention  areas”  are  now 
part  of  the  DRC  Malaria  Operation  Plan  (MOP)  under  the  National  Malaria  Control 
Program  (NMCP)  and  PMI,  including:  insecticide  treated  bed-nets,  malaria  Rapid 
Diagnostic  Tests  (mRDTs),  artemisinin-based  combination  therapy  (ACTs),  intermittent 
preventive  treatment  in  pregnancy  (IPTp)  with  sulfadoxine-pyrimethamine  (SP),  and 
training  heath  care  workers  for  malaria  treatment  and  diagnosis  (12;  13).  A  second 
Demographic  and  Health  Survey  was  conducted  in  the  DRC  (EDS-RDC  II)  between 
November  2013  and  February  2014.  The  EDS-RDC  II  reported  70%  of  households  own 
at  least  one  ITN,  56%  children  under  five  and  60%  of  pregnant  women  slept  under  an 
ITN  the  night  before  (193).  Microscopic  diagnosis  for  malaria  in  children  (age  6-59 
months)  was  performed  and  showed  23%  of  children  positive  for  malaria  parasitemia 
(193).  The  EDS-RDC  II  results  highlight  the  success  of  malaria  control  interventions  in 
the  DRC  since  initiation  of  the  PMI  program  compared  to  the  data  reported  in  the  2007 
EDS-RDC.  As  P.  falciparum  COI  has  been  shown  to  decline  in  malaria  endemic  regions 
where  malaria  transmission  is  reduced,  our  results  can  be  utilized  as  a  baseline  to 
compare  to  samples  collected  in  the  2013-2014  EDS-RDC  II  (72;  212;  263).  Future 
studies  are  planned  to  compare  P.  falciparum  COI  and  haplotype  population  genetics 
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between  the  2007  and  2013-2014  EDS-RDCs  in  order  to  understand  the  changing  malaria 
genetic  landscape  in  the  DRC  and  also  monitor  for  changes  in  transmission  due  to 
enhanced,  sustained,  or  lapsed  malaria  control  campaigns  (72). 

As  we  approach  the  goal  of  malaria  control  and  elimination,  sensitive  detection 
methods,  such  as  amplicon-based  deep  sequencing  technologies  have  the  potential  to 
increase  our  understanding  of  the  antigen  heterogeneity  of  malaria  vaccine  candidates, 
monitor  for  changes  in  malaria  transmission  intensity,  and  provide  additional  information 
about  malaria  population  genomics. 

Future  Directions 

We  plan  to  utilize  the  pfamal  amplicon-based  deep  sequencing  technique  and 
SeekDeep  analysis  platform  to  characterize  P.  falciparum  COI  and  detect  minor  variants 
in  three  future  epidemiology  studies.  First,  we  are  presently  collecting  cross-sectional 
samples  from  a  malaria  holoendemic  region  in  Western  Kenya  from  individuals  with 
asymptomatic  malaria.  These  cross-sectional  samples  will  also  be  tested  for  HIV,  which 
will  allow  for  further  investigation  into  the  impact  of  HIV  on  the  P.  falciparum  COI  and 
from  a  larger  sample  number.  Additionally,  we  will  also  be  conducting  a  longitudinal 
study  that  will  allow  for  investigation  into  the  changes  in  P.  falciparum  COI  in  HIV 
positive  individuals  as  they  initiate  antiretroviral  therapy  and  analyze  changes  in  P. 
falciparum  COI  based  on  CD4  T  cell  counts.  Second,  we  will  be  receiving  both  cross- 
sectional  and  longitudinal  samples  from  malaria  asymptomatic  individuals  in  Nigeria. 
These  individuals  will  also  be  tested  for  HIV  and  a  subset  of  HIV  positive  individuals 
will  be  followed  longitudinally.  Finally,  as  mentioned  in  the  discussion,  we  are  interested 
in  comparing  the  P.  falciparum  COI  and  pfamal  haplotype  data  between  the  2007  EDS- 


112 


RDC  and  the  2013-2014  EDS-RDC  II  samples  to  see  if  we  can  detect  changes  in  P. 
falciparum  COI  following  the  scale-up  of  malaria  intervention  strategies  in  the  DRC. 
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Table  15.  Pfamal  heminested  primer  sequences  and  PCR  conditions. 


PCR  Primers 

*Primer  Sequence  (5’->3’) 

AmalOF 

GCTGAAGTAGCTGGAACTCAA 

AmalF 

XXXXXXXXXXCCATCAGGGAAATGTCCAGT 

AmalR 

TTTCCTGCATGTCTTGAACA 

PCR  Reagents 

Final  concentration  in  PCR 

Platinum  PCR  SuperMix  High 
Fidelity  (Life  Technologies) 

0.83X 

AmalOF/F  primer  (20pM) 

170  nM 

AmalR  primer  (20pM) 

170  nM 

MgCl2  (25mM) 

420  nM 

Template  DNA  volume 

6.5  jil 

Total  volume 

30  pi 

PCR  Conditions 

Time 

Temperature 

Step  1 :  Initial  denaturation 

2  min 

94°  C 

Step  2:  Denaturation 

30  sec 

94°  C 

Step  3:  Annealing 

30  sec 

55°C 

Step  4:  Elongation 

1  min 

68°C 

Step  5:  Cycling 

Repeat  steps  2-5  for  40  cycles  total 

Step  6:  Final  elongation 

10  min 

68°C 

Platform 

BIO-RAD  T100  Thermal  Cycler 

*Xs  represent  thelO  nucleotide  MID  sequence  added  to  the  5’  end  of  primer  AmalF. 


Table  16.  Pfamal  Multiplex  identifying  (MID)  sequences. 


MID 

Sequence  (5’->3’) 

MID 

Sequence  (5’->3’) 

MIDI 

ACGAGTGCGT 

MID  14 

CGAGAGATAC 

MID2 

ACGCTCGACA 

MID  15 

ATACGACGTA 

MID3 

AGACGCACTC 

MID  16 

TCACGTACTA 

MID4 

AGCACTGTAG 

MID  17 

CGTCTAGTAC 

MID5 

ATCAGACACG 

MID  18 

TCTACGTAGC 

MID6 

ATATCGCGAG 

MID  19 

TGTACTACTC 

MID7 

CGTGTCTCTA 

MID20 

ACGACTACAG 

MID8 

CTCGCGTGTC 

MID21 

CGTAGACTAG 

MID  10 

TCTCTATGCG 

MID22 

TACGAGTATG 

MID  11 

TGATACGTCT 

MID23 

TACTCTCGTG 

MID  13 

CATAGTAGTG 

MID24 

TAGAGACGAG 
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Table  17.  Individual  sample  pfldh  qPCR ,pfamal  PCR,  and pfamal  sequencing  results 


Sample 

Pfldh  qPCR 

Pfldh  qPCR  ng/ml 

Pfamal  PCR 

Pfamal  sequence 

A0D1E 

Positive 

<0.10 

Negative 

Negative 

A5F6Q 

Positive 

0.43 

Positive 

Positive 

A6U7I 

Positive 

<0.10 

Positive 

Positive 

A8F4R 

Positive 

<0.10 

Negative 

Negative 

A8FOT 

Positive 

<0.10 

Positive 

Positive 

BOE1G 

Positive 

0.21 

Positive 

Positive 

BOR5X 

Positive 

<0.10 

Negative 

Negative 

B3H5Q 

Positive 

0.93 

Positive 

Positive 

B4Y1F 

Positive 

<0.10 

Positive 

Positive 

B6H1P 

Positive 

0.43 

Positive 

Positive 

B8C7S 

Positive 

0.35 

Positive 

Positive 

B9H9B 

Positive 

0.40 

Positive 

Positive 

B9N0X 

Negative 

n/a 

Negative 

Negative 

C4B1I 

Positive 

0.11 

Positive 

Positive 

C5W9N 

Positive 

0.24 

Positive 

Positive 

C7U8M 

Positive 

21.34 

Positive 

Positive 

C7X1I 

Positive 

>1000 

Positive 

Negative 

D1N7Y 

Positive 

0.18 

Positive 

Positive 

E4U6J 

Positive 

<0.10 

Negative 

Negative 

E5K5Y 

Positive 

<0.10 

Positive 

Positive 

E503B 

Positive 

<0.10 

Positive 

Positive 

E7Q3F 

Positive 

0.10 

Positive 

Positive 

F1Q8F 

Negative 

n/a 

Negative 

Negative 

F8M5F 

Negative 

n/a 

Negative 

Negative 

G0N4X 

Positive 

<0.10 

Positive 

Positive 

G1K9B 

Positive 

<0.10 

Positive 

Positive 

G3U03 

Positive 

0.17 

Positive 

Positive 

G3U5J 

Positive 

<0.10 

Negative 

Negative 

G9U2M 

Positive 

<0.10 

Negative 

Negative 

H107E 

Positive 

<0.10 

Positive 

Positive 

H2F2Q 

Positive 

<0.10 

Negative 

Negative 

H5R5J 

Positive 

0.17 

Positive 

Positive 

IOY1I 

Positive 

<0.10 

Positive 

Positive 

I1Y4M 

Positive 

3.52 

Positive 

Positive 

I2H8Z 

Positive 

0.39 

Positive 

Positive 

I4T0G 

Positive 

<0.10 

Negative 

Negative 

I7J7G 

Positive 

0.33 

Positive 

Positive 

J1B30 

Positive 

0.12 

Positive 

Positive 

J1K6B 

Positive 

6.57 

Positive 

Positive 
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J2D8W 

Positive 

0.20 

Negative 

Negative 

J3U0H 

Positive 

0.20 

Positive 

Positive 

J7J4E 

Positive 

0.34 

Positive 

Positive 

J806M 

Positive 

<0.10 

Positive 

Positive 

K2J7D 

Positive 

<0.10 

Negative 

Negative 

K4V5P 

Positive 

0.15 

Positive 

Positive 

K5C3U 

Positive 

<0.10 

Positive 

Positive 

K5D8B 

Positive 

<0.10 

Positive 

Positive 

K6U5Q 

Positive 

<0.10 

Positive 

Positive 

K8B5Y 

Positive 

<0.10 

Negative 

Negative 

L2L6F 

Positive 

<0.10 

Negative 

Negative 

L3V5P 

Negative 

n/a 

Negative 

Negative 

L5V7T 

Positive 

<0.10 

Positive 

Positive 

L6H9I 

Positive 

0.51 

Positive 

Positive 

L7F6E 

Positive 

<0.10 

Positive 

Positive 

L9C8F 

Negative 

n/a 

Positive 

Positive 

MOYOF 

Positive 

0.90 

Positive 

Positive 

M2P4I 

Negative 

n/a 

Positive 

Positive 

M2R0G 

Positive 

<0.10 

Negative 

Negative 

M5X6V 

Negative 

n/a 

Negative 

Negative 

M7E2Z 

Positive 

<0.10 

Positive 

Positive 

M9R6T 

Positive 

<0.10 

Positive 

Positive 

N2D2U 

Positive 

0.82 

Positive 

Positive 

N2Q9P 

Positive 

<0.10 

Negative 

Negative 

N5U0N 

Positive 

<0.10 

Positive 

Positive 

OOM4F 

Positive 

<0.10 

Positive 

Positive 

01Q9P 

Negative 

n/a 

Negative 

Negative 

01V6R 

Positive 

>1000 

Positive 

Positive 

03R50 

Positive 

0.55 

Positive 

Positive 

06I9M 

Positive 

<0.10 

Positive 

Positive 

P1M6J 

Positive 

0.11 

Positive 

Positive 

P5KOF 

Positive 

<0.10 

Positive 

Positive 

QOK1C 

Positive 

0.30 

Negative 

Negative 

Q1K1D 

Positive 

<0.10 

Positive 

Positive 

Q3H7I 

Positive 

<0.10 

Positive 

Positive 

Q6X5Z 

Positive 

0.31 

Positive 

Positive 

Q7N8T 

Positive 

<0.10 

Positive 

Positive 

Q8X4B 

Positive 

<0.10 

Positive 

Positive 

R7I5M 

Positive 

0.17 

Negative 

Negative 

R7Y9H 

Positive 

0.66 

Positive 

Positive 

S1H1C 

Positive 

<0.10 

Positive 

Positive 

S3P2N 

Positive 

<0.10 

Positive 

Positive 

S5N7S 

Negative 

n/a 

Negative 

Negative 

116 


S9C5J 

Positive 

0.63 

Negative 

Negative 

S9F70 

Positive 

0.35 

Positive 

Positive 

T1Y2V 

Positive 

<0.10 

Negative 

Negative 

T5F8M 

Positive 

15.06 

Positive 

Positive 

T5S6X 

Positive 

<0.10 

Positive 

Positive 

T7U2X 

Positive 

0.30 

Negative 

Negative 

T9D7N 

Positive 

<0.10 

Positive 

Positive 

T9L3R 

Positive 

1.73 

Positive 

Positive 

T9L5T 

Positive 

0.44 

Positive 

Positive 

T9X4F 

Positive 

1.04 

Positive 

Positive 

U3D5G 

Positive 

<0.10 

Negative 

Negative 

U4X6D 

Positive 

<0.10 

Positive 

Positive 

V3D4G 

Positive 

<0.10 

Positive 

Positive 

V3T6Y 

Positive 

0.55 

Positive 

Positive 

V6H9S 

Positive 

0.14 

Positive 

Positive 

V9I10 

Positive 

<0.10 

Positive 

Positive 

W3O0O 

Negative 

n/a 

Negative 

Negative 

W7E9R 

Negative 

n/a 

Negative 

Negative 

W8D8Q 

Positive 

1.01 

Positive 

Positive 

W9R6E 

Negative 

n/a 

Positive 

Negative 

X2L4P 

Negative 

n/a 

Negative 

Negative 

X2P1Q 

Positive 

<0.10 

Positive 

Positive 

X3P4U 

Negative 

n/a 

Positive 

Positive 

X5X71 

Negative 

n/a 

Negative 

Negative 

X5Z8L 

Positive 

<0.10 

Negative 

Negative 

X7A7M 

Positive 

<0.10 

Positive 

Positive 

X9E6R 

Positive 

<0.10 

Positive 

Positive 

Y109X 

Positive 

0.17 

Positive 

Positive 

Y7U3E 

Negative 

n/a 

Negative 

Negative 

Y9L6Z 

Positive 

<0.10 

Positive 

Positive 

Y9Q2B 

Positive 

<0.10 

Positive 

Positive 

Z6T2C 

Positive 

<0.10 

Positive 

Positive 

Z9M6C 

Positive 

<0.10 

Negative 

Negative 

Table  18.  Demographic  characteristics  of  individual  samples  with  pfamal  sequence  data 


Characteristic 

Individual  samples  (n=79) 

Female  (%) 

53  (67%) 

Mean  age  (range) 

32.7  (16-58) 

HIV  positive  (%) 

21  (27%) 
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Table  19.  Population  genetics  analyses  of  pfamal  haplotypes  from  individual  samples 


n 


#  of 

haplotypes 


Mean 

COI 


Total  #  of  Nucleotide 

polymorphic  diversity 

sites  (S)  (jt) 


Haplotype 

diversity 

(Hd) 


All  79  68  2.43  38  0.04192  0.998 


HIV  positive  21  27  2.29  22  0.04146  1.000 

HIV  negative  58  63  2.47  38  0.04215  0.998 


Bandundu 

11 

21 

2.64 

23 

0.04086 

1.000 

Bas-Congo 

4 

7 

2.00 

20 

0.04197 

1.000 

Equateur 

11 

21 

2.36 

24 

0.04369 

1.000 

Kasai- 

Occidental 

9 

14 

1.89 

20 

0.04367 

1.000 

Kasai- 

Oriental 

8 

19 

3.50 

23 

0.04407 

1.000 

Katanga 

4 

6 

1.75 

15 

0.03897 

1.000 

Kinshasa 

7 

13 

2.14 

21 

0.04196 

1.000 

Maniema 

4 

6 

1.50 

17 

0.03932 

1.000 

Nord-Kivu 

2 

5 

3.00 

17 

0.04462 

1.000 

Orientale 

14 

28 

2.93 

24 

0.04242 

1.000 

Sud-Kivu 

5 

7 

1.60 

21 

0.04935 

1.000 

Table  20.  Population  pairwise  comparisons  between  DRC  provinces  from  individual 
samples. 


Bandundu 

Bas-Congo  Equateur 

Kasai- 

Occidental 

Kasai- 

Oriental 

Katanga 

Kinshasa 

Maniema 

Nord-Kivu 

Orientale 

Bas-Congo 

0.00250 

Equateur 

0.00604 

0.01372 

Kasai-Occidental 

0.00000 

0.01231 

0.0066 

Kasai-Oriental 

0.00266 

0.02485 

0.00466 

0.01171 

Katanga 

0.01885 

0.02412 

0.02228 

0.01493 

0.04312 

Kinshasa 

0.00697 

0.00000 

0.00000 

0.00000 

0.00361 

0.02278 

Maniema 

0.00000 

0.00000 

0.00000 

0.00265 

0.00000 

0.00063 

0.01056 

Nord-Kivu 

0.02426 

0.00919 

0.00000 

0.01954 

0.01003 

0.03387 

0.01838 

0.00000 

Orientale 

0.00000 

0.00000 

0.00000 

0.00575 

0.00799 

0.01447 

0.00000 

0.00000 

0.00000 

Sud-Kivu 

0.00000 

0.00000 

0.00404 

0.03287 

0.02924 

0.00605 

0.02681 

0.00000 

0.00000 

0.00000 
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MRKLYCVLLL  SA  F  E  F  T YM I N  FGRGQN  YWEH  P YQN  SD VYR  P I N EH R EH PK E Y E YP LHQEH T YQQED SG ED EN T LQH A Y P I DH EGA E PA PQEQN L F S S I  E I  V  100 

AmalOF  AmalF 

ERSN  YMGN  PWTEYMAKYD  I  EEVHGSG  I  R  VD  LG  Ed|a  E  VAGTq|yR  l|p  SGKC  P  v|fGKG  I  I  I  EN  SN  T  T  F  L  T  P  VATGN  QY  LKDGG  F  A  F  P  P  T  E  P  LMS  PMT  L  D  EMRH  200 
FYKDNKYVKNLDELTI-|cSRHAGn|m  I  PDNDKNSN  YKYPAVYDDKDKKCH  I  L  Y  I  AAQEN  NG  PR  YC  N  KD  E  SKRN  SM  FC  F  R  PAKD  I  S  F  QN  YT  Y  L  SKN  V  VD  N  WEK  300 
VC  PRKN  LQNAKFGLWVDGNC  ED  I  PH  VN  E  F  PA  I D L F ECN K L V F E L SA SDQPKQ Y EQH L T D Y EK I KEGFKNKNASM I KSAF  L  PTGAFKADRYKSHGKGYNWG 400 
NYNTETQKCE I  F  N  VK  PTC  L  I NNSSY I ATTALSHP I  E V EN N F PC S L YKD E I MK E I  ERESKR I KLNDNDDEGNKK I  I  APR  I  F  I  SDDKD S LKC PCD P EMVSN S  500 
TCR  F  F  VCKC VERRA  E VT  SN  N  E VVVK  E  E YKD  E YAD I  PEHKPT YDKMK I  I  I ASSAAVAVLAT  I  L MV YL YKRKGN A EK YD KMD E PQD YGK SN SRN D EML D P EA  600 
S  FWG  E  EKRASH  T  TPVLMEKPYY  622 


Figure  12.  P.  falciparum  apical  membrane  antigen  1  (pfamal)  amino  acid  (aa)  sequence 
based  on  the  3d7  reference  strain  (XM_00 1347979.1). 

The  first  24  aa  (first  grey  box)  contain  the  signal  sequence  (ss)  followed  by 
Domain  I  (blue  shaded,  aa  25-320),  Domain  II  (red  shaded,  aa  321-442),  Domain 
III  (green  shaded,  aa  443-546),  and  the  transmembrane  domain  (grey  boxed,  aa 
547-622).  The  heminsted  primers  (AmalOF,  AmalF,  AmalR)  are  shown  in  black 
boxes. 
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DNA  extraction 


Duplicate  pfamal 
PCRs  w/  different 
MID  primers 


n 


i 


t 


i 


Pool  PCR  products 


Library  preparation 


I 


Massively  parallel  sequencing 


I 


SeekDeep  qluster 


SeekDeep  processClusters 


Sample  1  haplotypes  Sample  2  haplotypes 


Sample  3  haplotypes 


Figure  13.  Schematic  representation  of  the  sample  processing,  library  preparation,  and 
SeekDeep  pipeline  for  sequence  analysis. 

Circles  represent  distinct  P.  falciparum  strains  within  an  individual.  Different 
color  large  rectangles  indicate  distinct  pfamal  haplotypes.  Small  rectangles 
represent  MID  sequences.  Red  rectangles  represent  the  barcoded  index  sequence. 
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Fraction  era  No.  of  haplotypes 


Samples 


S(est) 

—  S  Means  (runs) 
Upper  95%  Cl 
Lower  95%  Cl 


14.  Rarefaction  curve  of  pfamal  haplotypes  from  individual  samples. 
Rarefaction  curve  of  the  actual  number  of  samples  compared  to  the  number  of 
haplotypes  (S  Means  (runs))  compared  to  the  estimated  number  of  haplotypes 
(S(est))  with  continued  sampling. 


□  K1 

□  7g8 
■  Dd2 

□  Ro33 


Figure  15.  Deep  sequencing  control  samples. 

A  sequencing  control  with  known  concentrations  of  P.  falciparum  strains  (Kl, 
7g8,  Dd2,  Ro33,  Vis)  was  utilized  in  duplicate  for  three  separate  PCRs  and 
downstream  sequencing  reactions. 
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Figure  16.  Individual  sample  COIs  and  haplotype  frequencies  based  on  geographical 
location  in  the  DRC. 

Size  of  pie  chart  represents  the  number  of  different  haplotypes  within  a  single 
individual.  Colors  indicate  different  haplotypes. 
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Age  (years) 


Figure  17.  P.  falciparum  COI  stratified  by  age  in  individual  samples. 

No  differences  were  observed  in  P.  falciparum  COI  based  on  age  (ANOVA 
P=0.9668). 


Figure  18.  P.  falciparum  COI  based  on  sex  in  individual  samples. 

No  differences  were  observed  in  P.  falciparum  COI  between  males  and  females 
(two-tailed,  unpaired  t-test  P=0.9017). 
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Figure  19.  P.  falciparum  COI  based  on  HIV  status. 

No  differences  were  observed  in  P.  falciparum  COI  between  HIV  positive  and 
HIV  negative  individuals  (two-tailed,  Fisher’s  exact  test,  P=0.6722). 
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Odds  Ratio 


Figure  20.  Estimated  odds  ratios  using  logistic  regression  analysis. 

Logistic  regression  to  estimate  the  odds  ratio  of  a  high  COI  (COI>2)  while 
controlling  for  age  (OR=1.025,  Cl:  0.977-1.075,  p=0 .3 12),  sex  (OR=1.419,  Cl: 
0.536-3.754,  p=0.481),  and  HIV  status  (OR=0.783,  Cl:  0.270-2.272,  p=0.652) 
was  not  significant. 
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COI 


Figure  21 .  Pfamal  COI  based  on  elevation  in  meters. 
Bars  represent  standard  deviation  (SD). 
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A) 
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B) 


Figure  22.  P.  falciparum  COI  by  DRC  Province. 

A)  Average  P.  falciparum  COI  from  individuals  in  each  of  the  1 1  DRC  provinces. 

B)  Pie  charts  represent  the  fraction  of  individuals  with  monoclonal  (COI=l)  or 
polyclonal  (C0I>1)  from  each  province.  The  size  of  the  pie  chart  reflects  the 
number  of  individual  samples  form  each  province. 
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Malaria  Prevalence 

Figure  23.  P.  falciparum  COI  versus  prevalence  in  the  DRC. 

No  significant  relationship  was  found  between  P.  falciparum  COI  and  malaria 
prevalence  in  individual  samples  (Coefficient  of  determination,  R  =0.01, 
p=0. 18).  Figure  prepared  by  Mark  Janko,  Departments  of  Biostatistics  & 
Geography,  University  of  North  Carolina  Chapel  Hill. 
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Bandundu 
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Figure  24.  Median-joining  Network  Diagram  representing  the  68  pfamal  haplotypes 
sequenced  from  individual  samples  (n=79). 

Each  circle  represents  a  different  haplotype,  the  size  of  the  circle  reflects  the 
number  of  samples  containing  that  particular  haplotype,  and  the  colors  indicate 
different  DRC  provinces.  Each  line  between  the  circles  indicates  a  nucleotide 
difference  between  related  haplotypes. 
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Figure  25.  Phylogenetic  tree  of  68  pfamal  haplotypes  from  individual  samples. 

We  used  MEGA6  to  generate  a  maximum- likelihood  tree  (1 ,000  bootstrap 
replicates)  based  on  the  Tamura-Nei  model  (284). 
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CHAPTER  5:  Summary  and  General  Conclusions 


Dissertation  Summary 

The  research  objectives  for  this  dissertation  were  to  1)  utilize  geospatial  tools  to 
estimate  the  JEV  vector  Cx.  tritaeniorhynchus  prevalence  in  Asia  and  2)  develop  highly 
sensitive  molecular  tools  to  detect  malaria  parasite  species  and  strains  in  complex  malaria 
infections.  To  address  our  research  objectives,  we  utilized  a  geospatial  modeling  program 
to  estimate  Cx.  tritaeniorhynchus  prevalence  throughout  the  entire  JE  endemic  region, 
developed  a  novel  real-time  PCR  (qPCR)  assay  to  detect  the  neglected  malaria  parasite 
species  P.  ovale,  and  utilized  next-generation  deep  sequencing  to  detect  multiclonal  P. 
falciparum  infections  and  investigate  the  P. falciparum  Complexity  of  Infection  (COI)  in 
the  Democratic  Republic  of  Congo  (DRC).  We  utilized  GIS  to  map  the  geographical 
location  of  Cx.  tritaeniorhynchus  and  P.  falciparum  in  endemic  regions  for  JE  and 
malaria,  respectively,  thus  highlighting  the  utility  of  GIS  to  map  the  distribution  of 
important  vectors  and  pathogens  that  cause  vector-borne  diseases  in  humans.  The 
approaches  we  used  to  achieve  the  research  objectives  in  this  dissertation  are  distinct 
from  each  other,  and  yet  each  was  developed  and  conducted  to  address  the  overarching 
goal  of  improving  the  detection  and  surveillance  of  vector-borne  diseases.  Detection  of 
both  the  pathogen  and  vector  are  critical  for  understanding  vector-borne  pathogen 
transmission  dynamics  and  for  controlling  vector-borne  diseases. 
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Chapter  Summaries 


Chapter  2  Summary 

The  aim  of  chapter  2  (Aim  1)  was  to  develop  an  ecological  niche  model  to 
estimate  the  JEV  vector  Cx.  tritaeniorhynchus  prevalence  in  Japanese  encephalitis 
endemic  regions.  To  address  this  aim,  we  utilized  the  Maxent  program  and  GIS  software 
to  develop  an  ecological  niche  model  based  on  known  occurrences  of  Cx. 
tritaeniorhynchus  and  preferred  environmental  conditions  of  the  vector.  Our  ecological 
niche  model  showed  that  the  majority  of  JE  clinical  cases  were  located  within  regions 
predicted  to  have  high  probability  of  Cx.  tritaeniorhynchus  prevalence.  Since  JEV 
remains  a  significant  cause  of  viral  encephalitis  in  Asia  and  the  endemic  region  for  JE  has 
expanded  dramatically  into  new  geographical  locations,  our  ecological  niche  model  could 
potentially  be  utilized  to  allocate  resources,  such  as  vector  control  measures  or 
vaccination  campaigns,  to  areas  with  high  estimated  vector  prevalence. 

Limitations 

Our  ecological  niche  model  to  estimate  Cx.  tritaeniorhynchus  prevalence  is 
limited  by  several  factors.  First,  sampling  bias  likely  influenced  our  model  as  we  only 
included  a  snapshot  of  Cx.  tritaeniorhynchus  occurrence  data  that  was  reported  in  the 
literature  and  did  not  take  into  account  environmental  changes  that  could  increase  or 
decrease  vector  prevalence  over  time.  Also,  extensive  Cx.  tritaeniorhynchus  sampling 
data  is  not  available  for  all  regions  in  Southeast  Asia  and  the  Western  Pacific  and 
therefore  our  model  could  be  improved  with  additional  occurrence  points  from  more 
geographical  locations.  Our  study  is  further  limited  by  including  only  JE  case  data 
available  through  published  literature  and  ProMED  data  and  does  not  reflect  the  actual 
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number  of  JEV  infections,  since  the  majority  of  infections  are  asymptomatic.  Finally, 
ecological  niche  models  only  provide  an  initial  framework  for  identifying  regions  that 
could  be  targeted  by  vector  control  measures,  as  additional  entomological  field  surveys 
and  community  engagement  are  critical  for  successful  control  of  vector  populations. 

Future  Directions 

Several  future  studies  could  be  conducted  based  on  our  initial  ecological  niche 
model  for  Cx.  tritaeniorhynchus  distribution.  For  example,  incorporation  of  recent  Cx. 
tritaeniorhynchus  occurrence  data  into  the  model  and  comparison  to  recent  JE  cases  and 
outbreaks  would  provide  an  updated  assessment  of  vector  distribution.  In  fact,  we 
recently  provided  the  data  used  in  our  model  to  researchers  at  the  University  of  Oxford 
who  are  developing  an  updated  JE  risk  map  based  on  several  covariates,  including  vector 
prevalence.  Another  potential  future  study  is  to  utilize  environmental  and  climatic 
variables  based  on  the  climate  change  predictions  to  investigate  whether  Cx. 
tritaeniorhynchus  estimated  prevalence  and  geographical  distribution  is  expected  to 
change  due  to  global  warming. 

Chapter  3  Summary 

The  aim  of  chapter  3  (Aim  2)  was  to  improve  detection  of  P.  ovale  by  targeting  a 
conserved  genetic  region  between  P.  ovale  curtisi  and  P.  ovale  wallikeri  subspecies  using 
a  real-time  PCR  (qPCR)  approach.  Previously  developed  P.  ovale  PCR  assays 
unintentionally  detected  only  one  of  the  two  P.  ovale  subspecies  due  to  genetic 
polymorphisms  in  the  primer  binding  regions  that  limited  amplification  and  detection  by 
qPCR.  Our  novel  P.  ovale- specific  qPCR  assay  is  based  on  a  conserved  region  between 
the  two  subspecies  within  the  reticulocyte  binding  protein  2  ( rbp2 )  gene  and  detects  low- 
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levels  of  P.  ovale  parasites  with  high  specificity  even  in  the  context  of  mixed  malaria 
species  infections.  We  validated  our  P.  ovale- specific  assay  based  several  parameters 
such  as  defining  the  limit  of  detection  and  quantification,  specificity,  repeatability,  and 
reproducibility.  Our  P.  ovale- specific  assay  successfully  detected  both  P.  ovale  curtisi 
and  P.  ovale  wallikeri  with  a  range  of  parasitemias  from  malaria  asymptomatic  samples 
in  Kenya.  Finally,  we  utilized  multilocus  genotyping  to  demonstrate  for  the  first  time  that 
both  P.  ovale  curtisi  and  P.  ovale  wallikeri  circulate  in  a  malaria  holoendemic  region  in 
Western  Kenya.  In  conclusion,  our  novel  P.  ovale- specific  assay  is  a  useful  tool  for  the 
simultaneous  detection  of  P.  ovale  subspecies  that  can  be  utilized  to  improve  detection  of 
this  neglected  malaria  parasite  species  in  malaria  endemic  regions. 

Limitations 

This  research  study  is  limited  by  several  factors.  First,  the  paucity  of  genetic 
information  published  for  P.  ovale  limits  our  ability  to  ensure  the  complete  lack  of 
polymorphisms  in  the  primer  and  probe  regions  from  all  P.  ovale  subspecies  and  strains 
worldwide.  Therefore,  we  could  still  potentially  miss  P.  ovale  infections  using  our  qPCR 
assay  due  to  genetic  polymorphisms  within  the  primer  and  probe  binding  regions  that 
would  subsequently  inhibit  PCR  amplification.  Second,  we  only  had  access  to  a  small 
sample  set  of  P.  ovale  samples  from  one  malaria  endemic  region,  limiting  our  ability  to 
fully  validate  our  P.  ovale- specific  assay  from  larger  and  more  globally  representative 
sample  set  that  also  includes  sub-microscopic  P.  ovale  infections.  Finally,  we  were 
unable  to  obtain  pure  P.  ovale  positive  samples  for  generation  of  a  standard  curve,  and 
thus  were  limited  to  validating  our  assay  using  a  plasmid  containing  the  P.  ovale  rbp2 
target. 
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Future  Directions 


Our  P.  ovale- specific  qPCR  assay  is  currently  being  used  in  a  cross-sectional 
study  in  Western  Kenya  to  determine  P.  ovale  prevalence  from  several  hundred  samples 
collected  from  malaria  asymptomatic  adults.  We  also  plan  to  utilize  our  P.  ovale- specific 
assay  for  the  detection  of  P.  ovale  infections  in  Nigeria  in  collaboration  with  the  Walter 
Reed  Program-Nigeria.  Another  possibility  for  future  research  is  multilocus  genotyping 
to  determine  P.  ovale  subspecies  prevalence  in  Kenya  and  Nigeria,  as  the  P.  ovale 
subspecies  prevalence  in  these  countries  has  not  yet  been  characterized.  Additionally,  we 
have  plans  to  obtain  CAP/CLIA  certification  for  our  P.  ovale- specific  qPCR  assay  to 
allow  malaria  parasite  species  detection  in  returned  soldiers  with  malaria  at  the  Walter 
Reed  National  Military  Medical  Center  hospital  in  collaboration  with  the  Walter  Reed 
Army  Institute  of  Research  (WRAIR).  In  addition  to  detecting  both  P.  ovale  subspecies  at 
the  same  time  with  a  single  qPCR  assay,  we  are  also  interested  in  developing  qPCR 
assays  that  can  differentiate  P.  ovale  curtisi  and  P.  ovale  wallikeri  as  we  attempt  to 
understand  the  potential  clinical  and  epidemiological  differences  between  these  two 
subspecies. 

Chapter  4  Summary 

The  aim  of  chapter  4  (Aim  3)  was  to  utilize  a  deep  sequencing  approach  to  detect 
multiclonal  P.  falciparum  infections  and  also  explore  the  impact  of  several  demographic 
factors  such  as  age,  sex,  HIV  status,  and  geographic  location  on  the  P.  falciparum 
Complexity  of  Infection  (COI).  We  utilized  an  amplicon-based  deep  sequencing 
approach  targeting  the  polymorphic  P.  falciparum  apical  membrane  antigen  (pfamal) 
gene  and  analyzed  the  deep  sequence  data  using  the  SeekDeep  targeted  amplicon  analysis 
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pipeline.  Overall,  we  successfully  sequenced  a  region  of  the  pfamal  and  found  88  unique 
pfamal  haplotypes  in  the  Democratic  Republic  of  Congo  (DRC).  Also,  the  majority  of 
individuals  included  in  our  study  had  multiclonal  (C0I>1)  P.  falciparum  infections.  We 
found  no  association  between  age,  sex,  HIV  status,  or  geographical  location  and  P. 
falciparum  COI.  Using  several  population  genetics  tools,  we  explored  the  pfamal 
diversity  and  report  high  pfamal  genetic  diversity  from  P.  falciparum  malaria  parasites  in 
the  DRC.  In  conclusion,  the  research  included  in  this  chapter  showed  the  utility  of 
amplicon-based  deep  sequencing  based  on  the  pfamal  target  for  the  detection  of 
multiclonal  P.  falciparum  infections. 

Limitations 

The  research  included  in  this  chapter  is  limited  by  several  factors.  First,  we  only 
included  asymptomatic  malaria  infections  in  adults  collected  from  a  subset  of  samples 
from  the  2007  EDS-RDC  and  therefore  did  not  include  malaria  symptomatic  infections  or 
malaria  infections  in  children.  Second,  we  may  have  inadvertently  failed  to  detect  some 
P.  falciparum  strains  due  to  polymorphisms  in  the  pfamal  primer  binding  regions  that 
would  prevent  PCR  amplification  or  P.  falciparum  strains  that  occur  below  the  limit  of 
pfamal  PCR  detection.  Additionally,  as  we  used  a  2.5%  cut  off  for  inclusion  of  pfamal 
haplotypes  for  sequence  analysis,  we  may  have  missed  haplotypes  that  occurred  below 
the  cut  off  frequency  level.  Finally,  we  were  unable  find  any  difference  in  the  P. 
falciparum  COI  based  HIV  status,  age,  sex,  malaria  prevalence,  or  geographic  location, 
although  this  may  have  been  the  result  of  a  relatively  small  sample  size. 
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Future  Directions 


We  are  currently  preparing  the  research  included  in  this  chapter  of  the  dissertation 
for  publication.  We  are  also  planning  to  perform  additional  analyses  of  pfamal  sequences 
from  the  DRC,  Republic  of  Congo,  and  Malawi  in  collaboration  with  Dr.  Steven 
Meshnick  and  Dr.  Jon  Juliano  at  the  University  of  North  Carolina  Chapel  Hill  that  we 
hope  will  also  be  included  in  the  publication.  Additionally,  we  plan  to  investigate  P. 
falciparum  COI  from  samples  collected  as  part  of  the  2013-2014  EDS-RDC  II.  This  will 
allow  for  comparison  of  P.  falciparum  COI  and  prevalence  after  PMI’s  implementation 
of  malaria  intervention  strategies  in  2010.  Finally,  we  also  plan  to  use  the  pfamal 
amplicon-based  deep  sequencing  approach  to  investigate  multiclonal  P.  falciparum 
infections  and  the  impact  of  HIV  on  P.  falciparum  COI  in  two  malaria/HIV  co-infections 
studies  currently  being  conducted  in  Kenya  and  Nigeria  using  both  cross-sectional  and 
longitudinal  samples. 

Overall  Conclusions 

The  overall  goal  of  the  research  presented  in  this  dissertation  is  to  improve  the 
detection  of  pathogens  and  vectors  in  order  to  advance  the  control  and  potential 
elimination  of  vector-borne  diseases.  We  describe  three  distinct  approaches  for 
improving  vector-borne  disease  detection,  utilizing  computational,  geospatial,  and 
molecular  tools.  Although  we  focused  on  the  JEV  vector  and  malaria  parasites,  the 
approaches  described  in  this  dissertation  could  potentially  be  applied  to  improve  the 
detection  other  vector-borne  and  non- vector-borne  infectious  diseases.  Control  of  vector- 
borne  diseases  requires  a  multifaceted  approach  that  incorporates  knowledge  of  the 
pathogen,  host,  vector,  and  environment  while  also  maintaining  sustainable  invention 
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strategies  that  prevent  disease  reintroduction  and  work  towards  disease  elimination.  In 
conclusion,  we  end  with  the  inspiring  words  of  Dr.  Margaret  Chan,  Director-General  of 
the  World  Health  Organization:  “  No  one  in  the  21st  century  should  die  from  the  bite  of  a 
mosquito,  a  sandfly,  a  blackfly,  or  a  tick.” 
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